don't use `TextDecoder` + `arrayBuffer()` in `blob.text()` #42265

jimmywarting · 2022-03-09T11:50:06Z

...Use TextDecoderStream Instead. ( or stream-consumers.text() )
Here is the problem:

Lines 312 to 313 in 6b004f1

    
           const dec = new TextDecoder(); 
        
           return dec.decode(await this.arrayBuffer());

Version

17.5

Platform

mac

Subsystem

No response

What steps will reproduce the bug?

const { Blob } = require('buffer')
const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) + header );
const blob = new Blob([bytes])
const text = await blob.text()
const length = text.length

How often does it reproduce? Is there a required condition?

every single time

What is the expected behavior?

to read a 500mb + blob using blob.text()

A solution i'm using in fetch-blob is using the streaming approach

const { Blob } = require('buffer')
const header = 24;
const bytes = new Uint8Array( (512 * 1024 * 1024) + header );
const blob = new Blob([bytes])

let res = ''

const iterable = blob.stream().pipeThrough(new TextDecoderStream())
for await (chunk of iterable) res += chunk

What do you see instead?

Uncaught:
TypeError [ERR_ENCODING_INVALID_ENCODED_DATA]: The encoded data was not valid for encoding utf-8
    at __node_internal_captureLargerStackTrace (node:internal/errors:464:5)
    at new NodeError (node:internal/errors:371:5)
    at TextDecoder.decode (node:internal/encoding:429:15)
    at Blob.text (node:internal/blob:314:16)
    at async REPL5:1:39 {
  errno: 1,
  code: 'ERR_ENCODING_INVALID_ENCODED_DATA'
}

Additional information

it's a disadvantage to use ArrayBuffer/TextEncoder in your .text() method to read it as a string.

At the time when you read the blob as a text it will use 3x memory (the size of the blob, the size of the arrayBuffer and the size of the string) quite a large memory spike at this point, there is no chance for v8 to do a GC

there is also a problem in v8 that you can't create strings with more than 500 MiB
however it's fine to concat the strings using str += 'foo' to avoid this problem. (as strings can just be references point to other multiple strings with a size & offset)

See more info here: https://stackoverflow.com/questions/61271613/chrome-filereader-api-event-target-result

The text was updated successfully, but these errors were encountered:

meixg added the buffer Issues and PRs related to the buffer subsystem. label Mar 9, 2022

jimmywarting mentioned this issue Mar 9, 2022

Implement createObjectURL/Blob from File API #16167

Closed

5 tasks

jimmywarting mentioned this issue May 5, 2022

concern regarding consumeBody node-fetch/node-fetch#1552

Open

jimmywarting mentioned this issue Jul 4, 2022

Getting warnings nodejs/undici#1524

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

don't use `TextDecoder` + `arrayBuffer()` in `blob.text()` #42265

don't use `TextDecoder` + `arrayBuffer()` in `blob.text()` #42265

jimmywarting commented Mar 9, 2022 •

edited

Loading

don't use TextDecoder + arrayBuffer() in blob.text() #42265

don't use TextDecoder + arrayBuffer() in blob.text() #42265

Comments

jimmywarting commented Mar 9, 2022 • edited Loading

Version

Platform

Subsystem

What steps will reproduce the bug?

How often does it reproduce? Is there a required condition?

What is the expected behavior?

What do you see instead?

Additional information

don't use `TextDecoder` + `arrayBuffer()` in `blob.text()` #42265

don't use `TextDecoder` + `arrayBuffer()` in `blob.text()` #42265

jimmywarting commented Mar 9, 2022 •

edited

Loading