Skip to content

Latest commit

 

History

History
178 lines (106 loc) · 12.2 KB

WASI-08-12.md

File metadata and controls

178 lines (106 loc) · 12.2 KB

WASI logo

Agenda for the August 12 video call of WASI Subgroup

  • Where: zoom.us
  • When: August 12, 16:00-17:00 UTC
  • Location: link on calendar invite
  • Contact:

Registration

None required if you've attended before. Email Lin Clark to sign up if it's your first time.

The meeting is open to CG members only. You can join the CG here.

Logistics

The meeting will be on a zoom.us video conference.

Agenda items

  1. Opening, welcome and roll call
    1. Please help add your name to the meeting notes.
    2. Please help take notes.
    3. Thanks!
  2. Announcements
    1. Sumbit a PR to add your announcement here
  3. Proposals and discussions
    1. wasi-parallel: propose and discuss a new API for exposing a system's parallel capabilities to WebAssembly (slides) (20m).
    2. Sketch of async and stream in Interface Types (slides) (40m).

Notes

Attendees

  • Pat Hickey
  • Andrew Brown
  • Lin Clark
  • Mingqiu Sun
  • Bailey Hayes
  • Piotr Sikora
  • Radu Matei
  • Till Schneidereit
  • Luke Wagner
  • Alex Crichton
  • Adam Foltzer
  • Saúl Cabrera
  • Syrus Akbary
  • Enrico Galli
  • Yong He
  • Dan Gohman
  • Matt Butcher
  • Johnnie Birch

wasi-parallel

Andrew Brown: Presenting something we’ve been working on for last few months, Johnnie, Peter, Mingqiu, Enrico.

Mingqiu Sun: Lack support for parallelism can be problem for some problem domains. Best Wasm can do is 128 bit SIMD support. So what are we doing? Last year, we proposed wasi-nn. That solves parallel computing problem for 1 domain. Approach we took is model-loader API. We assume some sort of encoding of ML model, load it, set up graph, and then execute. Works for frameworks like Tensorflow, Onyx. One problem we encountered is community members asking about what about my framework (not one of the major ones). We don’t have an answer for them now. In journals, not just limited to ML. This could be a barrier for adoption of Wasm in these areas. What’s worse for standalone is that TS.js has Wasm backend that can farm out structure into pool created by webworker, and we don’t have that in standalone env. What about providing low-level parallelism support in WASI support so that you can support all the different use cases. Initial scope is to target _____. Can be supported in variety of backend devices, CPU, GPU.

Today, want to explain what we’re doing, and also looking for partners. Not just limited to API only. Toochain related to solve this problem.

Andrew Brown: So given that, let’s describe goals and non-goals. This can change based on feedback

Want to access system parallelism. Trying to not modify the Wasm spec. Did have a goal to execute on heterogeneous devices, e.g. on a CPU or GPU. Another goal is to match abstraction that is provided by a bunch of different parallel programming frameworks, but don’t want to support all programming models. For example, didn’t go through p-thread route.

Draft API looks like [slide]. Based on parallel fork. A lot of frameowrks have parallel fork. Need to know what kind of device. You hint on the device that you wnat to run on, and then runtime gives you handle to device. Then you tell runtime what you’re going to do with memory. With GPUs you need to be able to move slices back and forth. This could change, but we found it sufficient. Most important call here is parallel fork. The kernel is the work you want to do in parallel. You tell it the level of parallelism you want to support. You can split up the work using the block size param. And in buffer and out buffer.

Any questions?

So here’s where we’re at currently. We’ve built a prototype using Wasmtime that can execute on a CPU. What we’re using to evaluated is PRK kernels. Repo maintained by the community taht has pretty common problems you can run implemented across different frameworks. We’re able to execute some of those kernels within wasi-parallel. Some hand coding involved. In some we could speed up linearly based on CPUs we have. Encouraging. Also working on enabling GPU execution using OpenCL, can execute simple kernels, still need to benchmark.

Finally, other thread is toolchain support for OpenMP programs to wasi-parallel. This is a proofpoint to show that it isn’t too divergent from other common frameworks.

Future directions. We do want to get feedback from the community. There have been many discussions about this topic previously, so we wanted to see what people think. Part involves refining API based on feedback. We’re also very interested in seeing if other companies and individuals would want to work on creating benchmarks that we could use to get statistics. Finally, investigating compiling from other frameworks and high-level library. Want to see if multiple frameworks can compile down. Could MediaPipe call wasi-parallel instead of p-thread or whatever it calls.

Francis McCabe: I have a question for you, please forgive ignorance. Why is WASI the right forum for this effort?

Andrew Brown: That’s up for debate. Some people might feel that parallel primitives could be added to wasm itself. From my understanding, just from reading threads proposal was that Wasm would provide the primitives, but different runtimes would have different reqs.

Mingqiu Sun: I would add that solving parallelism is a big problem with multiple approaches. I think the WASI approach has its advantage. We could use GPU as a backend as opposed to having Wasm core support for running in GPU.

Francis McCabe: You’re right that there are many approaches to parallelism. You’re coming from a community that has opinions. Why not standardize there?

Lin Clark: Time check

Adam Folzer: I foresee possible speed bump—I did research 10 years ago but haven’t kept up. In general to get good perf out, particularly if you’re going after heterogeneity, picking right parallelism params for device is important. Baking in the params will mean it only runs well on a particular device. Also excited because such good research on _______. The exciting part is using the JIT to do that inference when you instantiate it. On a completely different note. Setting aside question of whether in wasi, calling this data-parallel might be better.

Andrew Brown: Certainly API is in its infancy. Can add or change function calls. We’ve discussed that you’re basically baking in config. One goal is not to get to 100% parity. We want to narrow, maybe to 70% of native.

Luke Wagner: I had a comment or thought. Relating to the use fo a func ref in the API. It closes over the instance. That might be a problem for the GPU.

Andrew Brown: Peter went on an investigation on this. Func ref only gets us so far. He thought module linking might help with that. I don’t know if that will fix that, but we’re really interested in ideas.

Luke Wagner: I’d love to talk about that design problem more

Till Schneidereit: Wnat to follow up, I do think this is the right group. We have nominative optional for exactly this.

Bailey Hayes: (from chat): parallelism in wasm modules is exciting. I sketched my thoughts for how I see this relating to wasi-data

Andrew Brown: I know we’re up on our time, next steps?

Lin Clark: Could either schedule a meeting with the interested parties, or discuss more next meeting.

Andrew Brown: Let’s discuss more next meeting

Sketch of async and stream in Interface Types

Luke Wagner: To pick up where we left off last time, talked about blocking being something you could put on functions so we can unify callback and greenthread. They superficially look different, but solve same problems. Want to allow composition between components using these different styles. Important use cases for both styles. Not all runtimes support greenthreading, and languages with language async will want to minimize runtime.

On the flip side, for continuouations, this only supports sync io or newer languages [Slide]

Roughly the approach is to make these an encapsulated ABI detail. Getting feedback from that and iterating. Started thinking through additional requirements for HTTP. Request handlers need to be able to return a response but keep running. A single handler needs to be able to stream in and out from downstream and upstream. An exciting challenge appeared—can http proxy chaining just use component composition? This would just be an application for the virtualization goal we have for WASI. In this case, we’re saying “I’m importing a fetch” and it might go over the network, or it might just go to another component.

The wins from being able to do this is perf win, portability because proxies compose in same way, and module linking will be pretty powerful. Can express lots of different chaining patterns. For example, non-linear chains that can fractally fan out.

If we can solve this generally for HTTP, this is a litmus test. Lowev levels of network stack, image/video pipelines, streaming analytics

Expand the scope of blocking to return+streaming, and going to rename blocking to async. Let’s say we have a function and we have this async effect on it. I have two choices at the ABI level. ABI varies based on whether you’re importing or exporting. I can import with sync, which means I’m awaiting. Then there’s an async option. Index into local table that ABI maintains. Then to work with that future, there’s a wait that’s built in. I give it a future and the result is the B*. Also a select where I have multiple futures. With these three primitives, I can do continuation style for calling the async function foo. This could be imported by a core wasm function.

Callback api. Classic C style callback. So those are the two imports.

When we want to export, for continuation it’s very simple in sync. We handle early return in async. The fact that I can do it in the middle is what allows it to return early. Caveat is that I’m going to trap if I call it twice, or if I return without calling return at all.

In contrast, in callback abi, there’s one option. I get two functions: early return, and enqueue extra work.

Any questions?

Francis McCabe: Does this meet all of the goals you had for fetch?

Luke Wagner: Yes, I think so

Francis McCabe: Looking forward to seeing how that works.

Luke Wagner: All right, I’ll move on.

If you’re familiar with typed continuation proposal, there’s a hidden slide that shows. Now walk through example.

[Slide]

Make a component called splitter. Going to import a fetch function. It’s given a string and returns list of u8. Going to nest a core module. Now I need to instantiate it, so need to create adapters.

Any questions?

Now going to show how you can compose.

[Slide]

Wants to call the fetch, do some work in parallel, then wait and frob the results.

Going to walk through the call stack.

[Slide] Composition does interleaving that you would want.

So that was blocking and early return. Next piece is streaming. Just so say where we started with—”can’t we just” [Slide]

….

So let’s add a new stream type to interface types. What’s a stream? A list that is passed incrementally over the course of an async call. Thus a stream can only make sense in async. Each stream has two ends, writer and reader. The reader and writer ends are created with i32 handles by canonical ABI. So these ends are private details.

Any questions?

Adam Folzer: You mentioned that a non-async function wouldn’t make sense to take or return streams. What about passing it from one import to another.

Luke Wagner: That passthrough case is important, will get to it at the end.

Piotr Sikora: What’s the relation between stream and pull buffer, etc

Luke Wagner: Strictly more general, maybe prettier

Piotr Sikora: So do we expect it to supercede?

Luke Wagner: Yeah, I think so.

Francis McCabe: You are taking a dep on the stack switching proposal?

Luke Wagner: Only with the coroutine ABI. Callback API does not require it.