Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle very large files #560

Open
2 of 3 tasks
devowit opened this issue Jul 21, 2021 · 16 comments
Open
2 of 3 tasks

Handle very large files #560

devowit opened this issue Jul 21, 2021 · 16 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@devowit
Copy link
Contributor

devowit commented Jul 21, 2021

  • backend
  • frontend
  • suggest annotations uses sampling
@kyao kyao modified the milestones: Milestone 3, Milestone 2 Jul 22, 2021
@kyao
Copy link
Collaborator

kyao commented Jul 22, 2021

Use this file for testing: http://databank.worldbank.org/data/download/WDI_excel.zip

@kyao
Copy link
Collaborator

kyao commented Aug 2, 2021

Another test dataset. This one is only ~16K rows.

IDP_Time_Series_Data_No_Missing_No_Annotation.zip

@kyao kyao modified the milestones: Milestone 2, Milestone 3 Aug 3, 2021
@devowit
Copy link
Contributor Author

devowit commented Aug 5, 2021

@g1eb both files kethia added are rejected by the frontend as too large

@g1eb g1eb self-assigned this Aug 24, 2021
@g1eb g1eb added the enhancement New feature or request label Aug 24, 2021
@g1eb
Copy link
Collaborator

g1eb commented Sep 5, 2021

@devowit the second file Ke-Thia (IDP_Time_Series_Data_No_Missing_No_Annotation.csv) shared is only 1.1mb and works fine..

The other file (WDIEXCEL.xlsx) is 70mb and when I remove any frontend limit I get an error from the backend, see screenshot below:

Screen Shot 2021-09-04 at 5 50 11 PM

@g1eb
Copy link
Collaborator

g1eb commented Sep 5, 2021

I've changed the theoretical limit in the app_config.py:42 from 16mb to a 100mb

MAX_CONTENT_LENGTH = 100 * 1024 * 1024  # 100 MB max file size

@kyao
Copy link
Collaborator

kyao commented Sep 8, 2021

@g1eb Would you please do some profiling to see what file size can t2wml handle and how long does it take for the results to return

@kyao
Copy link
Collaborator

kyao commented Sep 8, 2021

I tried to upload this 1.8 MB file and got an error:

t2wml-web        | 2021/09/08 05:58:04 [error] 24#24: *42 client intended to send too large body: 1848588 bytes, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"

WGI_Data.zip

@g1eb
Copy link
Collaborator

g1eb commented Sep 12, 2021

ahh, one more filter that was not allowing large files - nginx, I fixed that now.

Here's the nginx setting in case causx needs to set the same setting on their end: usc-isi-i2/t2wml-web@d6cfb01

@kyao
Copy link
Collaborator

kyao commented Sep 13, 2021

I am still getting a timeout error with nginx modification.

t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET / HTTP/1.1" 200 3134 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/main.682e22e7.chunk.js HTTP/1.1" 200 90382 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/2.4b359fb7.chunk.js HTTP/1.1" 200 559369 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/3.e6a9a2f8.chunk.js HTTP/1.1" 200 4210 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-backend    | 172.18.0.3 - - [13/Sep/2021 04:03:46] "GET /api/causx/token HTTP/1.0" 200 -
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /api/causx/token HTTP/1.1" 200 178 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 2021/09/13 04:18:28 [warn] 26#26: *5 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000001, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"
t2wml-web        | 2021/09/13 04:23:28 [error] 26#26: *5 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", upstream: "http://172.18.0.2:13000/api/causx/upload/data", host: "localhost:8080", referrer: "http://loc$
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:23:28 +0000] "POST /api/causx/upload/data HTTP/1.1" 504 494 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"

@g1eb
Copy link
Collaborator

g1eb commented Sep 13, 2021

Which file is this Ke-Thia? I had to update the limit again to be around 500mb.
Let's try again with the updated backend and frontend images in place.

@kyao
Copy link
Collaborator

kyao commented Sep 13, 2021

It's WDI. I will try again

@kyao
Copy link
Collaborator

kyao commented Sep 13, 2021

I was uploading the CSV version, which is bigger than 100MB

@g1eb
Copy link
Collaborator

g1eb commented Sep 13, 2021

Right, it should work now with the limit being 500mb, it would still take a long time though.
I would wait till we publish the paginated version to upload that.

@kyao
Copy link
Collaborator

kyao commented Sep 27, 2021

@devowit I tried suggest annotations on WDI dataset. It look 33 minutes on my machine, and it returned a 2GB json. Most of the json are error messages, which the web front end ignores. How about having an option that suppresses error messages.

@devowit
Copy link
Contributor Author

devowit commented Sep 29, 2021

currently suggest annotations returns a full layer result, not just the suggested annotation.

There are two options:

  1. suggest annotations returns the annotation only, not a full layer result
  2. @g1eb sends start and end parameters so that results are only fetched for x number of rows.

@g1eb
Copy link
Collaborator

g1eb commented Sep 29, 2021

We would need the annotations to be present in the layers in order to draw them.

What part of the response is dependent on the start and end indexes? Annotations returned are based on all the rows regardless of the indexes I would provide.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants