Is encouraging binary encoding/decoding a good idea? Should it be so prominent? #6

domenic · 2021-07-01T19:30:44Z

I found the arguments at whatwg/html#6811 (comment) by @Kaiido somewhat persuasive. Basically, if you're encoding your bytes to and from a string, you're probably doing something wrong, and you should instead modify your APIs or endpoints to accept bytes anyway.

There are definitely cases where it's useful, mostly around parsing and serializing older file formats. But I'm not sure they need to be promoted to the language (or web platform).

Relatedly, even if we think this is a capability worth including, I worry that putting it on ArrayBuffer makes it seem too prominent. It makes base64 decoding/encoding feel "promoted" on the same level as fundamental binary-data operations such as slicing or indexed access. From this perspective something like https://github.com/lucacasonato/proposal-binary-encoding (with static methods) seems nicer in that it silos off this functionality to a separate utility class.

The text was updated successfully, but these errors were encountered:

bakkot · 2021-07-01T22:04:01Z

you should instead modify your APIs or endpoints to accept bytes anyway

I don't know about you, but a substantial portion of the code I write talks to APIs which I am not in a position to modify. I think that's probably the case for many developers. Just to pick a few examples I've encountered: Google's speech to text API, the Google Drive API, and Github's API all expect you to provide binary data encoded with base64 in some circumstances.

In the other direction a great many APIs return data in base64, usually as part of a larger response - for example, JSON-based APIs generally base64-encode binary data which they wish to return as part of the response (what else could you do?).

It makes base64 decoding/encoding feel "promoted" on the same level as fundamental binary-data operations such as slicing or indexed access.

Ehh... I don't think the fact that two APIs are exposed in the same way implies they are equally promoted. (Though actually indexing is done with syntax, not a method call, so indexing is strictly more promoted than this would be.) I have wanted Array.prototype.map approximately a thousand times more frequently than I've wanted Array.prototype.copyWithin, for example.

And ASCII serialization/deserialization is a pretty fundamental operation on binary data, so ArrayBuffer seems like the right place to put those methods. Certainly I as a developer would not think to look for a class outside of ArrayBuffer to find the method for base64-encoding an ArrayBuffer.

sffc · 2021-07-14T05:09:18Z

JSON is a fundamental part of the language, and JSON requires that array buffers be stored as text, so I think Base64 is fundamental enough to be this prominent.

bathos · 2021-07-16T03:14:18Z

Not disagreeing with the conclusion, but there are other ways to represent binary data in JSON and the suitability varies. Strings are often the most practical option, but for small binary values, arrays of numbers are usually better. A real world example of where strings are the worst option is the “challenge” and “user handle” binary values that get exchanged in WebAuthn.

Every demo of WebAuthn I’ve seen encodes these (tiny — 64 bytes or less) binaries as urlsafe base64 strings in JSON during interchange. (I’m not sure why they add the extra steps for urlsafe given it’s sent in a JSON body — any flavor of base64 would be fine — but they all seem to do it.)

Encoding those values as ordinary JSON arrays of numbers is more direct, less error-prone, and the size doesn’t make a material difference:

// JSON-serializable representation:
[ ...new Uint8Array(buffer) ];

// Simpler and safer restoration from JSON:
Uint8Array.from(array);

dead-claudia · 2021-11-30T11:41:41Z

I would like to point out that there are a few encoding/decoding types that are practically everywhere both client-side and server-side:

Base64 string ↔ raw binary, due to JSON, XML, and URLs not supporting arbitrary data without lots of escaping
- The "URL-safe" encoding simply replaces + and / with _ and - - this would be a good candidate for an encoder option, but that's about it as a single decoder could easily decode both by simply changing a lookup table slightly.
Hex string ↔ raw binary, used both for raw data (Base64 would be better, but some people are just lazy) and for cryptographic constants
Native string ↔ raw UTF-8 due to all the basically everything that requires it (it's been the default text-to-binary conversion for Node's buffers since the moment .toString was added, and WHATWG's TextDecoder/TextEncoder has never supported anything other than UTF-8 to/from strings)

I've had a WIP transcoding proposal sitting privately, and there is a very significant performance boost to be had by having this done within the engine: they can iterate strings via their native representation (whether it be a cons string or a flat string) and build it with zero unnecessary copies. Additionally, while this isn't in of itself an argument for using JS engines, they're all three embarassingly parallel tasks very well-suited to SSE vectorization, and JS engines are much more likely to look into those and similar where possible than embedders as they already have to care a lot about architectural specifics between their JIT and WebAssembly.

jimmywarting · 2022-02-15T22:45:54Z

I do agree with original commenter that base64 url should be discouraging. it's wasteful to send 33% more bandwith and wasting unnecessary processing time encoding/decoding to and from strings.

and the WebAuthn/json reason sending things back and forth between api's can be dealt with other communication strategies such as FormData + Blob

I have abuse fetch power to retrieve multiple files back from a server to the browser by doing something like

const fd = await response.formData()
const files = fd.getAll('files')
const ab = await files[0].arrayBuffer()

much quicker and easier than having to use any zip/tar stuff. there isn't any reason you can do the same thing on the server now either when NodeJS and Deno can do now using the same fetch api on the backend.

just send the webauthn binary using something like:

fd.append('challange', new Blob([uint8array]))
fetch(url, { body: fd })
// and do this on the backend:
const fd = await req.formData()

this way of sending formdata works both ways.

As such i am -1 on implementing new binary encoder

The platform have involved to better handle binary data nowdays that don't require things to be sent via base64 or json
we have things such as bson and protobuf and other binary representation. JSON isn't the only solution and it isn't the best solution for doing everything with it.

you can also use fetch to convert a base64 data url into something else:

const b64toRes = (base64, type = 'application/octet-stream') => {
  return fetch(`data:${type};base64,${base64}`)
  res.arrayBuffer()
  res.blob()
  res.json()
  res.body // stream
}

but again, base64 should have been avoided in the first place

speaking of formdata (off topic)... would it be a great idea to have something like: formdata.append('stuff', typedArray)?

dead-claudia · 2022-02-15T23:06:35Z

I am in retrospect growing of the opinion that @domenic is right that this doesn't belong on the array buffer itself, but in maybe a built-in module or something. It's also worth mentioning a built-in module or separate global would be a lot easier for those maintaining embedded runtimes like XS to implement.

bathos · 2022-02-19T03:55:43Z

(Hid my question in turn, but appreciate the answer, thanks.)

bakkot · 2024-02-08T01:01:18Z

It is the opinion of the committee that this is worth doing. It's true that it's better to avoid the overhead when possible, but often it simply isn't, and we should make accommodations for that reality.

domenic mentioned this issue Jul 1, 2021

Should base64 advance alone or should it be part of a larger proposal? #7

Closed

This comment was marked as off-topic.

Sign in to view

jimmywarting mentioned this issue Mar 8, 2022

Start moving to Uint8Array in new APIs? nodejs/node#41588

Open

jimmywarting mentioned this issue Oct 7, 2022

Consider base16k #11

Closed

jimmywarting mentioned this issue Jan 1, 2023

Consider adding blob.dataUrl() method, or some other analogue to readAsDataURL w3c/FileAPI#179

Open

jimmywarting mentioned this issue Apr 21, 2023

feel like base32 encoding/decoding is missing nodejs/node#22395

Closed

jimmywarting mentioned this issue May 14, 2023

doc: Don't think atob and btoa should be marked as legacy nodejs/node#40754

Closed

jimmywarting mentioned this issue Oct 15, 2023

Can't make a Bun.file using base64 string buffer oven-sh/bun#1729

Closed

bakkot closed this as completed Feb 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is encouraging binary encoding/decoding a good idea? Should it be so prominent? #6

Is encouraging binary encoding/decoding a good idea? Should it be so prominent? #6

domenic commented Jul 1, 2021

bakkot commented Jul 1, 2021

sffc commented Jul 14, 2021 •

edited

Loading

bathos commented Jul 16, 2021

dead-claudia commented Nov 30, 2021

jimmywarting commented Feb 15, 2022 •

edited

Loading

dead-claudia commented Feb 15, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

bathos commented Feb 19, 2022

bakkot commented Feb 8, 2024

Is encouraging binary encoding/decoding a good idea? Should it be so prominent? #6

Is encouraging binary encoding/decoding a good idea? Should it be so prominent? #6

Comments

domenic commented Jul 1, 2021

bakkot commented Jul 1, 2021

sffc commented Jul 14, 2021 • edited Loading

bathos commented Jul 16, 2021

dead-claudia commented Nov 30, 2021

jimmywarting commented Feb 15, 2022 • edited Loading

dead-claudia commented Feb 15, 2022

This comment was marked as off-topic.

This comment was marked as off-topic.

bathos commented Feb 19, 2022

bakkot commented Feb 8, 2024

sffc commented Jul 14, 2021 •

edited

Loading

jimmywarting commented Feb 15, 2022 •

edited

Loading