Handle very large files #560

devowit · 2021-07-21T09:59:59Z

backend
frontend
suggest annotations uses sampling

kyao · 2021-07-22T23:04:09Z

Use this file for testing: http://databank.worldbank.org/data/download/WDI_excel.zip

kyao · 2021-08-02T22:44:34Z

Another test dataset. This one is only ~16K rows.

IDP_Time_Series_Data_No_Missing_No_Annotation.zip

devowit · 2021-08-05T06:51:39Z

@g1eb both files kethia added are rejected by the frontend as too large

g1eb · 2021-09-05T00:53:01Z

@devowit the second file Ke-Thia (IDP_Time_Series_Data_No_Missing_No_Annotation.csv) shared is only 1.1mb and works fine..

The other file (WDIEXCEL.xlsx) is 70mb and when I remove any frontend limit I get an error from the backend, see screenshot below:

g1eb · 2021-09-05T01:02:18Z

I've changed the theoretical limit in the app_config.py:42 from 16mb to a 100mb

MAX_CONTENT_LENGTH = 100 * 1024 * 1024  # 100 MB max file size

kyao · 2021-09-08T05:54:30Z

@g1eb Would you please do some profiling to see what file size can t2wml handle and how long does it take for the results to return

kyao · 2021-09-08T06:06:10Z

I tried to upload this 1.8 MB file and got an error:

t2wml-web        | 2021/09/08 05:58:04 [error] 24#24: *42 client intended to send too large body: 1848588 bytes, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"

WGI_Data.zip

g1eb · 2021-09-12T20:17:51Z

ahh, one more filter that was not allowing large files - nginx, I fixed that now.

Here's the nginx setting in case causx needs to set the same setting on their end: usc-isi-i2/t2wml-web@d6cfb01

kyao · 2021-09-13T04:46:52Z

I am still getting a timeout error with nginx modification.

t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET / HTTP/1.1" 200 3134 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/main.682e22e7.chunk.js HTTP/1.1" 200 90382 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/2.4b359fb7.chunk.js HTTP/1.1" 200 559369 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /static/js/3.e6a9a2f8.chunk.js HTTP/1.1" 200 4210 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-backend    | 172.18.0.3 - - [13/Sep/2021 04:03:46] "GET /api/causx/token HTTP/1.0" 200 -
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:03:46 +0000] "GET /api/causx/token HTTP/1.1" 200 178 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"
t2wml-web        | 2021/09/13 04:18:28 [warn] 26#26: *5 a client request body is buffered to a temporary file /var/cache/nginx/client_temp/0000000001, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", host: "localhost:8080", referrer: "http://localhost:8080/"
t2wml-web        | 2021/09/13 04:23:28 [error] 26#26: *5 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 172.18.0.1, server: localhost, request: "POST /api/causx/upload/data HTTP/1.1", upstream: "http://172.18.0.2:13000/api/causx/upload/data", host: "localhost:8080", referrer: "http://loc$
t2wml-web        | 172.18.0.1 - - [13/Sep/2021:04:23:28 +0000] "POST /api/causx/upload/data HTTP/1.1" 504 494 "http://localhost:8080/" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:91.0) Gecko/20100101 Firefox/91.0" "-"

g1eb · 2021-09-13T20:01:56Z

Which file is this Ke-Thia? I had to update the limit again to be around 500mb.
Let's try again with the updated backend and frontend images in place.

kyao · 2021-09-13T20:04:03Z

It's WDI. I will try again

kyao · 2021-09-13T21:02:09Z

I was uploading the CSV version, which is bigger than 100MB

g1eb · 2021-09-13T21:05:57Z

Right, it should work now with the limit being 500mb, it would still take a long time though.
I would wait till we publish the paginated version to upload that.

kyao · 2021-09-27T21:01:42Z

@devowit I tried suggest annotations on WDI dataset. It look 33 minutes on my machine, and it returned a 2GB json. Most of the json are error messages, which the web front end ignores. How about having an option that suppresses error messages.

devowit · 2021-09-29T12:42:03Z

currently suggest annotations returns a full layer result, not just the suggested annotation.

There are two options:

suggest annotations returns the annotation only, not a full layer result
@g1eb sends start and end parameters so that results are only fetched for x number of rows.

g1eb · 2021-09-29T15:01:17Z

We would need the annotations to be present in the layers in order to draw them.

What part of the response is dependent on the start and end indexes? Annotations returned are based on all the rows regardless of the indexes I would provide.

kyao modified the milestones: Milestone 3, Milestone 2 Jul 22, 2021

kyao modified the milestones: Milestone 2, Milestone 3 Aug 3, 2021

g1eb self-assigned this Aug 24, 2021

g1eb added the enhancement New feature or request label Aug 24, 2021

kyao assigned devowit Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Handle very large files #560

Handle very large files #560

devowit commented Jul 21, 2021 •

edited

Loading

kyao commented Jul 22, 2021

kyao commented Aug 2, 2021

devowit commented Aug 5, 2021

g1eb commented Sep 5, 2021

g1eb commented Sep 5, 2021

kyao commented Sep 8, 2021

kyao commented Sep 8, 2021

g1eb commented Sep 12, 2021

kyao commented Sep 13, 2021

g1eb commented Sep 13, 2021

kyao commented Sep 13, 2021

kyao commented Sep 13, 2021

g1eb commented Sep 13, 2021 •

edited

Loading

kyao commented Sep 27, 2021

devowit commented Sep 29, 2021

g1eb commented Sep 29, 2021

Handle very large files #560

Handle very large files #560

Comments

devowit commented Jul 21, 2021 • edited Loading

kyao commented Jul 22, 2021

kyao commented Aug 2, 2021

devowit commented Aug 5, 2021

g1eb commented Sep 5, 2021

g1eb commented Sep 5, 2021

kyao commented Sep 8, 2021

kyao commented Sep 8, 2021

g1eb commented Sep 12, 2021

kyao commented Sep 13, 2021

g1eb commented Sep 13, 2021

kyao commented Sep 13, 2021

kyao commented Sep 13, 2021

g1eb commented Sep 13, 2021 • edited Loading

kyao commented Sep 27, 2021

devowit commented Sep 29, 2021

g1eb commented Sep 29, 2021

devowit commented Jul 21, 2021 •

edited

Loading

g1eb commented Sep 13, 2021 •

edited

Loading