Support download data asynchronously #166

huningxin · 2021-05-07T07:33:07Z

Follow up on the issue discussed in #162 for operation-level scenario in frameworks that support asynchronous data downloads. The changes include:

Introduce MLTensor interface that supports download data asynchronously.
Add an overloaded version of MLGraph.compute that returns MLTensor synchronously.
Add a new example that computes multiple graphs without downloading the intermediate results.

Fix #156.

@RafaelCintron

Preview | Diff

anssiko

Thanks! I made a few comments about error handling.

anssiko · 2021-05-07T10:25:43Z

index.bs

            <!-- Validate inputs and outputs -->
-            1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop.
+            1. If any of the following requirements are unmet, then throw a {{TypeError}} and stop.


Would it be helpful to use different exception types to signal different ways compute() can fail? Do we expect a web developer to write control flow for error handling that could recover if it'd know more about the reason of failure? If that'd be a valid use case, then:

We could either use multiple simple exceptions (these match ECMAScript error objects):
https://heycam.github.io/webidl/#dfn-simple-exception

Or DOMExceptions with error names:
https://heycam.github.io/webidl/#idl-DOMException-error-names

For example, we could throw a new "DataError" DOMException to signal value.data errors.

anssiko · 2021-05-07T10:25:47Z

index.bs

-                                    1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop.
-                            </div>
+                        1. If there is an error returned by |this|.{{MLGraph/[[implementation]]}}, then:
+                            1. Throw an {{OperationError}} and stop.


Given "OperationError" is a DOMException error name, we should say:

Throw an "OperationError" DOMException and stop.

(For simple exceptions, we can simply say e.g. "Throw a TypeError".)

Thanks @anssiko . I'll fix that once the API shape discussion is settled.

wchao1115 · 2021-05-08T19:40:35Z

index.bs

+const bufferC = new Float32Array(sizeOfShape([3, 3]));
+const outputs = graph.compute(inputs);
+await outputs.c.data(bufferC);


This pattern implies that for every compute call, the browser must allocate an intermediate resource for the execution result, since the only pre-allocated output buffer allowed is the buffer used as the download destination. I'm thinking in the case of an implementation on the GPU where the execution result will need to be staged in a GPU buffer that can't be pre-allocated according to this change.

Thanks @wchao1115 . I think you are right, the pre-allocated output buffers should be provided at compute call. I revert the MLOutput change and keep its usage for pre-allocated output buffers. By following the discussion in #162, I add the MLTensor interface for the newly allocated output buffers that are returned by compute call. Please take another look.

wchao1115 · 2021-05-08T19:53:02Z

index.bs

+  Promise<MLData> data();
+  Promise<undefined> data(MLData data);


I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call. This may not be unusual if the scenario calls for an execution result be in a CPU memory like object detection, etc., but there are also many other scenarios where the output should be in a GPU buffer like the style transfer or GAN.

On the semantic front, if the data call is going to mean more than just a simple data download operation, say, including layout conversion in some implementation, then I will have a hard time understanding how the browser would be able to implement this call behind the scene i.e. how could an implementation of a mere MLOutput struct know how to layout-convert every possible output of every operation defined which has this MLOutput struct bound to at execution time? It would also mean that any operation can produce an output operand of any layout format since the real output data can only be obtained from the eventual data call after the graph is executed.

I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call.

For GPU case, I agree this would add one buffer overhead if user code pre-allocates output buffer. So I revert the change of MLOutput and keep its usage for pre-allocated output buffers. With that, the user code can bind the pre-allocated GPU buffer via MLOutput as the execution output without requiring a hidden one. If the user code doesn't pre-allocate GPU buffer, it can get the newly allocated output GPU buffer via MLTensor.data().

how could an implementation of a mere MLOutput struct know how to layout-convert every possible output of every operation defined which has this MLOutput struct bound to at execution time?

An implementation could create an MLTensor (in my new commit) that wraps the output object of the native execution API and return it to user code. For example, an oneDNN API based MLTensor implementation could wrap the dnnl_memory that is used as the destination memory of dnnl_primitive_execute. A dnnl_memory may use blocked layout for performance optimization. When user code calls MLTensor.data(), the implementation could use reorder primitive to convert the blocked layout of the dnnl_memory to plain layout.

The new change seems to introduce an additional MLTensor into the mix with another compute overload that has a different semantic. The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the compute method. Generally speaking, the hard part of designing an abstraction API, like webnn, is precisely to make it so that the caller doesn't need to have an intimate knowledge of how the API is implemented in order to call it correctly.

The point I was making earlier about how an implementation of MLOutput could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.

For example, an MLOutput could be bound to a graph that has one of the various activations, a convolution, a pooling, a batchnorm, or an RNN, etc as the graph's output layer. So if we define MLOutput's behavior so that it knows how to perform layout conversion after any graph is executed, then it effectively must know how any graph produces its output through any operator at the end of the graph. It must also know how to post-process any graph output of any such operator. I think this will be very hard for the implementer to get right on any platform. As a data point, DirectML, for instance, has no concept of layout post-processing post-graph execution.

The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the compute method.

It's based on pre-allocating output buffers or not. If the caller pre-allocates the output buffers, call compute(inputs, outputs). If the caller doesn't, call outputs = compute(inputs). In current spec, the compute also has the optional outputs argument that is for the two usages. The difference is only for the newly allocated outputs returned by compute.

The point I was making earlier about how an implementation of MLOutput could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.

As I mentioned, the layout conversion should be an implementation choice. From the API surface, the results returned by MLTensor.data() are always in standard layout defined in the spec. An implementation would not care about the layout conversion, if the native API it uses doesn't have such concept, such as DirectML as you mentioned, actually the webnn-native DirectML backend implements in this way. However, this change would allow another implementation to optimize if the native API exposes the platform optimized layout, such as oneDNN that I mentioned.

I think sync and async is a good discussion point. Supporting both may complicate the implementation in the browser as they carry different guarantees and require different setup with different implications to the caller. I personally still prefer that we stick with just one model for consistency.

@RafaelCintron has a great suggestion w.r.t. sync/async API in the CG call this morning.

RafaelCintron: if we must have async version, perhaps we can have that be strongly type to only take ArrayBuffer and return them

I think it's a good idea. Not sure how to translate that into an API proposal. @huningxin Do you want to try that?

Agree, it is a good idea. Thanks @RafaelCintron .

However, as the origin intent of this PR is for layout conversion and it has touched so many other important topics, I would suggest we handle them in separate issues. What I would like to do is:

open an issue about native format support for future spec and close this PR.

open an issue for compute API that only takes pre-allocated output buffers.

open an issue for sync API for wasm lib usage Support CPU - WebAssembly scenario of the op level execution use case #156 (comment)

WDYT? @wchao1115

Are you suggesting that we close this PR with the 3 issues tracked separately? I'm fine with that too. I can probably tackle the second issue in my next PR.

Are you suggesting that we close this PR with the 3 issues tracked separately? I'm fine with that too.

Yes. Opened #173

I can probably tackle the second issue in my next PR.

Thanks. I happened to open #174 for second and third issue. Please take a look.

anssiko · 2021-05-13T15:43:01Z

(We discussed this PR on today's call https://www.w3.org/2021/05/13-webmachinelearning-minutes.html#t11 and agreed to continue and consolidate related discussion in here.)

huningxin · 2021-05-31T06:42:33Z

Follow up with #173 and #174. Close this PR. Thanks again for the valuable inputs.

huningxin requested review from pyu10055, anssiko and wchao1115 May 7, 2021 07:33

anssiko requested changes May 7, 2021

View reviewed changes

wchao1115 reviewed May 8, 2021

View reviewed changes

Add MLTensor as compute returns

a0f8b31

huningxin changed the title ~~Change MLOutput to download data asynchronously~~ Support download data asynchronously May 11, 2021

huningxin force-pushed the mloutput branch from 697e425 to a0f8b31 Compare May 11, 2021 08:00

huningxin requested a review from RafaelCintron May 24, 2021 02:14

This was referenced May 31, 2021

Support output tensor in native memory format #173

Closed

Support sync API and require pre-allocated output buffers #174

Merged

huningxin closed this May 31, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support download data asynchronously #166

Support download data asynchronously #166

huningxin commented May 7, 2021 •

edited by pr-preview bot

Loading

anssiko left a comment

anssiko May 7, 2021

anssiko May 7, 2021

huningxin May 11, 2021

wchao1115 May 8, 2021

huningxin May 11, 2021 •

edited

Loading

wchao1115 May 8, 2021 •

edited

Loading

wchao1115 May 9, 2021

huningxin May 11, 2021

wchao1115 May 11, 2021

huningxin May 12, 2021

wchao1115 May 27, 2021

wchao1115 May 27, 2021

huningxin May 27, 2021

wchao1115 May 27, 2021

huningxin May 31, 2021

anssiko commented May 13, 2021

huningxin commented May 31, 2021

		Promise<MLData> data();
		Promise<undefined> data(MLData data);

Support download data asynchronously #166

Support download data asynchronously #166

Conversation

huningxin commented May 7, 2021 • edited by pr-preview bot Loading

anssiko left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin May 11, 2021 • edited Loading

Choose a reason for hiding this comment

wchao1115 May 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anssiko commented May 13, 2021

huningxin commented May 31, 2021

huningxin commented May 7, 2021 •

edited by pr-preview bot

Loading

huningxin May 11, 2021 •

edited

Loading

wchao1115 May 8, 2021 •

edited

Loading