Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support download data asynchronously #166

Closed
wants to merge 1 commit into from

Conversation

huningxin
Copy link
Contributor

@huningxin huningxin commented May 7, 2021

Follow up on the issue discussed in #162 for operation-level scenario in frameworks that support asynchronous data downloads. The changes include:

  • Introduce MLTensor interface that supports download data asynchronously.
  • Add an overloaded version of MLGraph.compute that returns MLTensor synchronously.
  • Add a new example that computes multiple graphs without downloading the intermediate results.

Fix #156.

@RafaelCintron


Preview | Diff

Copy link
Member

@anssiko anssiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I made a few comments about error handling.

<!-- Validate inputs and outputs -->
1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop.
1. If any of the following requirements are unmet, then throw a {{TypeError}} and stop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be helpful to use different exception types to signal different ways compute() can fail? Do we expect a web developer to write control flow for error handling that could recover if it'd know more about the reason of failure? If that'd be a valid use case, then:

We could either use multiple simple exceptions (these match ECMAScript error objects):
https://heycam.github.io/webidl/#dfn-simple-exception

Or DOMExceptions with error names:
https://heycam.github.io/webidl/#idl-DOMException-error-names

For example, we could throw a new "DataError" DOMException to signal value.data errors.

index.bs Outdated
1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop.
</div>
1. If there is an error returned by |this|.{{MLGraph/[[implementation]]}}, then:
1. Throw an {{OperationError}} and stop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given "OperationError" is a DOMException error name, we should say:

Throw an "OperationError" DOMException and stop.

(For simple exceptions, we can simply say e.g. "Throw a TypeError".)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anssiko . I'll fix that once the API shape discussion is settled.

index.bs Outdated
Comment on lines 1882 to 1884
const bufferC = new Float32Array(sizeOfShape([3, 3]));
const outputs = graph.compute(inputs);
await outputs.c.data(bufferC);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pattern implies that for every compute call, the browser must allocate an intermediate resource for the execution result, since the only pre-allocated output buffer allowed is the buffer used as the download destination. I'm thinking in the case of an implementation on the GPU where the execution result will need to be staged in a GPU buffer that can't be pre-allocated according to this change.

Copy link
Contributor Author

@huningxin huningxin May 11, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wchao1115 . I think you are right, the pre-allocated output buffers should be provided at compute call. I revert the MLOutput change and keep its usage for pre-allocated output buffers. By following the discussion in #162, I add the MLTensor interface for the newly allocated output buffers that are returned by compute call. Please take another look.

index.bs Outdated
Comment on lines 1712 to 1713
Promise<MLData> data();
Promise<undefined> data(MLData data);
Copy link
Collaborator

@wchao1115 wchao1115 May 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call. This may not be unusual if the scenario calls for an execution result be in a CPU memory like object detection, etc., but there are also many other scenarios where the output should be in a GPU buffer like the style transfer or GAN.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the semantic front, if the data call is going to mean more than just a simple data download operation, say, including layout conversion in some implementation, then I will have a hard time understanding how the browser would be able to implement this call behind the scene i.e. how could an implementation of a mere MLOutput struct know how to layout-convert every possible output of every operation defined which has this MLOutput struct bound to at execution time? It would also mean that any operation can produce an output operand of any layout format since the real output data can only be obtained from the eventual data call after the graph is executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this complicates the GPU case and makes it less efficient because it essentially requires 2 buffers to get a GPU execution result, one hidden from the caller and another as a destination of an async download call.

For GPU case, I agree this would add one buffer overhead if user code pre-allocates output buffer. So I revert the change of MLOutput and keep its usage for pre-allocated output buffers. With that, the user code can bind the pre-allocated GPU buffer via MLOutput as the execution output without requiring a hidden one. If the user code doesn't pre-allocate GPU buffer, it can get the newly allocated output GPU buffer via MLTensor.data().

how could an implementation of a mere MLOutput struct know how to layout-convert every possible output of every operation defined which has this MLOutput struct bound to at execution time?

An implementation could create an MLTensor (in my new commit) that wraps the output object of the native execution API and return it to user code. For example, an oneDNN API based MLTensor implementation could wrap the dnnl_memory that is used as the destination memory of dnnl_primitive_execute. A dnnl_memory may use blocked layout for performance optimization. When user code calls MLTensor.data(), the implementation could use reorder primitive to convert the blocked layout of the dnnl_memory to plain layout.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new change seems to introduce an additional MLTensor into the mix with another compute overload that has a different semantic. The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the compute method. Generally speaking, the hard part of designing an abstraction API, like webnn, is precisely to make it so that the caller doesn't need to have an intimate knowledge of how the API is implemented in order to call it correctly.

The point I was making earlier about how an implementation of MLOutput could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.

For example, an MLOutput could be bound to a graph that has one of the various activations, a convolution, a pooling, a batchnorm, or an RNN, etc as the graph's output layer. So if we define MLOutput's behavior so that it knows how to perform layout conversion after any graph is executed, then it effectively must know how any graph produces its output through any operator at the end of the graph. It must also know how to post-process any graph output of any such operator. I think this will be very hard for the implementer to get right on any platform. As a data point, DirectML, for instance, has no concept of layout post-processing post-graph execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with this change is that the caller will now need to know how webnn is implemented in order to choose the right calling pattern for the compute method.

It's based on pre-allocating output buffers or not. If the caller pre-allocates the output buffers, call compute(inputs, outputs). If the caller doesn't, call outputs = compute(inputs). In current spec, the compute also has the optional outputs argument that is for the two usages. The difference is only for the newly allocated outputs returned by compute.

The point I was making earlier about how an implementation of MLOutput could possibly know how to implement it properly when the layout conversion semantic is baked into its definition is about all of the possible outcomes it must produce, not about what platform API, the browser must call.

As I mentioned, the layout conversion should be an implementation choice. From the API surface, the results returned by MLTensor.data() are always in standard layout defined in the spec. An implementation would not care about the layout conversion, if the native API it uses doesn't have such concept, such as DirectML as you mentioned, actually the webnn-native DirectML backend implements in this way. However, this change would allow another implementation to optimize if the native API exposes the platform optimized layout, such as oneDNN that I mentioned.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think sync and async is a good discussion point. Supporting both may complicate the implementation in the browser as they carry different guarantees and require different setup with different implications to the caller. I personally still prefer that we stick with just one model for consistency.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@RafaelCintron has a great suggestion w.r.t. sync/async API in the CG call this morning.

RafaelCintron: if we must have async version, perhaps we can have that be strongly type to only take ArrayBuffer and return them

I think it's a good idea. Not sure how to translate that into an API proposal. @huningxin Do you want to try that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, it is a good idea. Thanks @RafaelCintron .

However, as the origin intent of this PR is for layout conversion and it has touched so many other important topics, I would suggest we handle them in separate issues. What I would like to do is:

  1. open an issue about native format support for future spec and close this PR.
  2. open an issue for compute API that only takes pre-allocated output buffers.
  3. open an issue for sync API for wasm lib usage Support CPU - WebAssembly scenario of the op level execution use case #156 (comment)

WDYT? @wchao1115

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we close this PR with the 3 issues tracked separately? I'm fine with that too. I can probably tackle the second issue in my next PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you suggesting that we close this PR with the 3 issues tracked separately? I'm fine with that too.

Yes. Opened #173

I can probably tackle the second issue in my next PR.

Thanks. I happened to open #174 for second and third issue. Please take a look.

@huningxin huningxin changed the title Change MLOutput to download data asynchronously Support download data asynchronously May 11, 2021
@anssiko
Copy link
Member

anssiko commented May 13, 2021

(We discussed this PR on today's call https://www.w3.org/2021/05/13-webmachinelearning-minutes.html#t11 and agreed to continue and consolidate related discussion in here.)

@huningxin
Copy link
Contributor Author

Follow up with #173 and #174. Close this PR. Thanks again for the valuable inputs.

@huningxin huningxin closed this May 31, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support CPU - WebAssembly scenario of the op level execution use case
3 participants