Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support async Blob sources #37338

Closed
jimmywarting opened this issue Feb 12, 2021 · 1 comment
Closed

Support async Blob sources #37338

jimmywarting opened this issue Feb 12, 2021 · 1 comment
Labels
buffer Issues and PRs related to the buffer subsystem.

Comments

@jimmywarting
Copy link

jimmywarting commented Feb 12, 2021

The terminology "async Blob source" was mention earlier[1] I think it means in chromes world a "BlobDataItem"

BlobDataItem: This is a primitive element that can basically be a File, Bytes, or another Blob. It also stores an offset and size, so this can be a part of a file. (This can also represent a “future” file and “future” bytes, which is used to signify a bytes or file item that has not been transported yet).
https://chromium.googlesource.com/chromium/src/+/master/storage/browser/blob/README.md

  • In simple terms it can be a Blob that comes from other places like the filesystem that don't have the data (source) in the memory.
  • It's basically a dummy blob that have a point reference to some file on the hard drive with an offset of where to start and stop reading from
  • Or it can be a slice of another blob, with a new offset.

// if you create a blob
const blobA = new Blob([new ArrayBuffer(20_000)])
// and slice it
const blobB = blobA.slice(0, 10_000)
  • ...Then we shouldn't create a new blob and allocated 30.000 bytes for both blobA and blobB it should still only be 20.000 bytes
  • blobB should be a reference to blobA with another offset from 0 to 10_000 and shouldn't allocate any new buffer
    • so when we are reading blobB then it creates a readable stream on blobA that reads bytes 0-10000

And if we create a 3rd blob blobC = new Blob([blobB, blobB])

  • Then blobC will just have two reference to blobB (witch actually refers to two slices of blobA). And reading this blobC will be like if we created two readable streams and read them one after the other
    blobC.stream() === [blobB.stream(), blobB.stream()]
// this pseudo code can be a bit off (wrong) but it can explain what i mean
blobB.stream = () => new streams.Readable.from(function* () {
  yield* blobA[kSource].slice(0, 10000)
})

blobC.stream = () => new streams.Readable.from(function* () {
  yield* blobB.stream()
  yield* blobB.stream()
})

// A better way is if blobC can have direct reference to the internal
// blob parts to blobA so blobB can be garbage collected

in this case both blobB and BlobC would not have any underlying source that holds the data in memory. this would only be (nested) references to blobA

Guess this is some important steps towards solving async blob sources and mixing both blobs that could come from the fs and mixed with blobs constructed with data from the memory

I created something like this async sources in fetch-blob and also a kind of BlobDataItem that was backed up by the filesystem and didn't hold anything in the memory. I have totally manage to solve this slicing stuff in a synchronous way and also constructing new blobs with other parts. the only thing that is async is how it reads the data. My blob is constructed with a sequence of blob parts (with offsets of other parts) (not a single source like a full ArrayBuffer containing all data).

That is how it can manage to take any other instance of a Blob class that isn't our own and knowing nothing about the internal source of the other blob. So you can construct fetch-blob with something that looks and behaves like any other Blob. it could for instance accept a BlobDataItem from the fs and also buffer.Blob

note that a blob from fs should keep track of last modified time and if it isn't the same when you are reading, then the internal stream should throw a error

@jimmywarting
Copy link
Author

wondering if this can be closed now...?
those this currently work:

const blob = new Blob([ other_blob_parts ])
const url = url.createObjectURL(blob)
new Worker(url)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
buffer Issues and PRs related to the buffer subsystem.
Projects
None yet
Development

No branches or pull requests

2 participants