Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use TextDecoder + arrayBuffer() in blob.text() #42265

Open
jimmywarting opened this issue Mar 9, 2022 · 0 comments
Open

don't use TextDecoder + arrayBuffer() in blob.text() #42265

jimmywarting opened this issue Mar 9, 2022 · 0 comments
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@jimmywarting
Copy link

jimmywarting commented Mar 9, 2022

...Use TextDecoderStream Instead. ( or stream-consumers.text() )
Here is the problem:

node/lib/internal/blob.js

Lines 312 to 313 in 6b004f1

const dec = new TextDecoder();
return dec.decode(await this.arrayBuffer());

Version

17.5

Platform

mac

Subsystem

No response

What steps will reproduce the bug?

const { Blob } = require('buffer')
const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) + header );
const blob = new Blob([bytes])
const text = await blob.text()
const length = text.length

How often does it reproduce? Is there a required condition?

every single time

What is the expected behavior?

to read a 500mb + blob using blob.text()

A solution i'm using in fetch-blob is using the streaming approach

const { Blob } = require('buffer')
const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) + header );
const blob = new Blob([bytes])

let res = ''

const iterable = blob.stream().pipeThrough(new TextDecoderStream())
for await (chunk of iterable) res += chunk

What do you see instead?

Uncaught:
TypeError [ERR_ENCODING_INVALID_ENCODED_DATA]: The encoded data was not valid for encoding utf-8
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at new NodeError (node:internal/errors:371:5)
    at TextDecoder.decode (node:internal/encoding:429:15)
    at Blob.text (node:internal/blob:314:16)
    at async REPL5:1:39 {
  errno: 1,
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

it's a disadvantage to use ArrayBuffer/TextEncoder in your .text() method to read it as a string.

At the time when you read the blob as a text it will use 3x memory (the size of the blob, the size of the arrayBuffer and the size of the string) quite a large memory spike at this point, there is no chance for v8 to do a GC

there is also a problem in v8 that you can't create strings with more than 500 MiB
however it's fine to concat the strings using str += 'foo' to avoid this problem. (as strings can just be references point to other multiple strings with a size & offset)

See more info here: https://stackoverflow.com/questions/61271613/chrome-filereader-api-event-target-result

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

No branches or pull requests

2 participants