Add support for device selection #162

wchao1115 · 2021-04-13T06:10:02Z

Add a device preference in the context option in addition to the power preference. This change allows the app to have more control over what type of execution device should be used. It enables an effective interop with WASM (Support CPU - WebAssembly scenario of the op level execution use case #156).
Follow up on the issue discussed in PR#149 around supporting operation-level scenario in frameworks that support asynchronous data downloads.
Further clarify level of support for GPU resources from WebGPU and WebGL in relation to the underlying execution device the MLContext is based on.

…nstraints. Introduce MLTensor that allows asynchronous readback.

huningxin · 2021-04-14T03:38:08Z

index.bs

+  // Prefer a compute-specific processor such as a neural engine or NPU.
+  "compute",
+  // Prefer a traditional software-based device such as a CPU.
+  "software"


I would prefer to naming it as cpu. It would be a clear device name that the Wasm libs could select to avoid the unnecessary data copying cross devices, like the Support CPU - WebAssembly scenario of the op level execution use case describes.

Other than the API name prefix, most API naming conventions generally discourage the use of abbreviations as identifier names. If we were to call it "cpu" we would also need a "gpu" value to be consistent, and then what's for AI accelerators? "npu", "vpu", "tpu"? I think that would start a naming discussion to no end.

I suppose the "cpu" and "gpu" abbreviations would be well known by developers because they are already widely used by native ML APIs, e.g. NNAPI DeviceTypeCode defines ANEURALNETWORKS_DEVICE_CPU and ANEURALNETWORKS_DEVICE_GPU, ML Compute MLCDeviceType defines gpu and cpu and oneDNN dnnl_engine_kind_t also defines dnnl_cpu and dnnl_gpu.

For AI accelerators, probably we can just call them as "accelerator" (in the context of WebNN). This is also used by native APIs, such as NNAPI DeviceTypeCode defines ANEURALNETWORKS_DEVICE_ACCELERATOR for "Dedicated accelerator for Machine Learning workloads.". Another example is OpenCL cl_device_type defines CL_DEVICE_TYPE_CPU, CL_DEVICE_TYPE_GPU and CL_DEVICE_TYPE_ACCELERATOR for heterogeneous hardware.

Actually, today's modern CPUs already have ML specific instructions, such as Vector Neural Network Instruction (VNNI) of X86 and Scalable Vector Extension 2 (SVE2) of Arm, and are adding even more advanced instructions, e.g., for matrix multiplication. My major concern is "software" seems not to reflect these new capabilities that would accelerate the ML workloads for web apps.

I'm open to naming suggestions if others have input on this. In my line of work designing Windows API at Microsoft, we generally stay away from using abbreviations because its meaning and perceived freshness tends to change over time. It is also harder for people with different background to understand. (Windows API is well known for longevity).

My 2 cents worth: "compute" and "software" sound too generic, and I wouldn't be able to guess what they meant. On the other hand, I do think most people can guess what "cpu" or "gpu" mean. I agree custom neural accelerators don't have as obvious a choice. "npu" or "accelerator" or "custom" sound fine to me.

+1 on cpu and gpu

I couldn't find any reference or recommendations regarding usage of abbreviations in the Web API definition. Anyone knows where to look? @anssiko. If it is allowed, should it be used in lowercase or uppercase? etc.

DirectX API in general avoids abbreviations, and terms like "graphics" or even "software" have been used for quite some time. e.g.

https://docs.microsoft.com/en-us/windows/win32/direct3d11/atoc-dx-graphics-direct3d-11
https://docs.microsoft.com/en-us/windows/win32/api/d3dcommon/ne-d3dcommon-d3d_driver_type

@wchao1115 Web Platform Design Principles > Naming principles https://w3ctag.github.io/design-principles/#naming-is-hard is probably the best resource. Naming is always a topic of active discussion in API design, and there's often no absolute rights or wrongs.

Given there's a Web API called WebGPU, I suspect "gpu" would pass if you prefer it that way. Similarly, we can probably agree that "cpu" is familar to web developers of today.

That said, I'd challenge the group to think this from the perspective of future-proofing. How do the possible future architectures fit into the device buckets we choose? We could make this extensible with new buckets in the future, or we could make the initial ones abstract enough to accommodate future devices. I think there's always a trade-off whatever we choose. What I think we want to achieve is to be able to provide web developers with knobs they can use to give hints to the implementation so their workloads can be executed in a manner that yields the best user experience.

Thanks @anssiko. I'll take this advice in the design principles: Please consult widely on names in your APIs.

The hardest is probably the accelerator type. Let's discuss this naming in our next CG call.

huningxin · 2021-04-14T17:26:48Z

index.bs

-console.log(&#96;values: ${outputs.c.data}&#96;);
+const outputs = {'c': new MLTensor(new Float32Array(sizeOfShape([3, 3])))};
+graph.compute(inputs, outputs);
+const data = await outputs.c.data();


I am trying to figure out the right behavior of MLTensor for the pre-allocated outputs scenario. Will outputs.c.data() allocate and turn a new ArrayBufferView instead of downloading the result into the pre-allocated ArrayBufferView? And if user code captures the returned MLTensor, what's the relationship between the returned MLTensor and the given MLTensor? e.g.,

const buffer = new Float32Array(sizeOfShape([3, 3]); const outputs = {'c': new MLTensor(buffer))}; let returns = graph.compute(inputs, outputs); const buffer1 = await returns.c.data(); const buffer2 = await outputs.c.data(); // is buffer === buffer1 === buffer2?

As the outputs are given, I would suggest compute not to return the results. The calling data() of the given output MLTensor should download the results to the pre-allocated ArrayBufferView and return it (the same object). e.g.,

const preallocatedBuffer = new Float32Array(sizeOfShape([3, 3]); const outputs = {'c': new MLTensor(preallocatedBuffer))}; graph.compute(inputs, outputs); // compute does not return const returnedBuffer = await outputs.c.data(); // returnedBuffer === preallocatedBuffer

huningxin · 2021-04-14T17:57:29Z

index.bs

 console.log(&#96;outputs include ${Object.keys(outputs)}&#96;);

 // Compute d.
-outputs = await graph.compute(inputs, {d});
+outputs = graph.compute(inputs, {d});


For the optional output scenario, user code only needs to specify the required output names. Previous MLOutput has all optional fields, so it can be an empty dictionary and not bound to any resources. However with the new MLTensor, it looks like user code has to create from a resource which is not necessary for this usage.

Yes, in this case {d} is a dictionary with a single key associated with a null (MLTensor) value. Am I missing something here?

I may make mistake for record<DOMString, MLOutput> case, it should be {'d': {}}. However, for record<DOMString, MLTensor>, I am afraid the {'d': null} is invalid.

As the outputs will be returned by compute, probably we can optimize this usage by allowing user to only specify the output names in an array. e.g.,

outputs = graph.compute(inputs, ['d']);

Syntax-wise, are you saying that the following is also invalid?

const d = builder.matmul(a, b); const e = builder.add(d, c); --> (?) const graph = await builder.build({d, e});

Can you tell me exactly how record<K,V> should be used and how should we write the above line in question? I've never found a usage sample for it in the WebIDL spec.

Is it?
await builder.build({'d': d, 'e': e});

--> (?) const graph = await builder.build({d, e});

This code is valid. It is equivalent to await builder.build({'d': d, 'e': e});

They are two styles of object properties definition. The former one is so called "shorthand property names" of ES2015.

I said outputs = graph.compute(inputs, {d}); is invalid, because the object d is a type of Operand in that example. If d is a type of MLOutput, it would be valid.

huningxin · 2021-04-15T03:19:48Z

index.bs

+  MLResource resource();
+
+  // Asynchronously download the result of a computation onto a typed array.
+  Promise<ArrayBufferView> data();


The MLTensor that downloads the result to ArrayBufferView probably needs to be separated from ones for GPU resources (say MLGPUTensor). It is because that the GPU resources have their own data download methods. e.g.,

interface MLTensor { constructor(ArrayBufferView buffer, optional sequence<long> dimensions); sequence<long> dimensions(); Promise<ArrayBufferView> data(); }; typedef (MLResourceBufferView or WebGLTexture or GPUTexture) MLGPUResource; interface MLGPUTensor { constructor(MLGPUResource resource, optional sequence<long> dimensions); sequence<long> dimensions(); MLGPUResource resource(); };

According to the table of device types and the resource types:

MLContext created with any device types can accept and return MLTensor,

MLContext created with GPU device can accept and return MLTensor, and MLGPUTensor for gpu interop.

probably needs to be separated from ones for GPU resources (say MLGPUTensor). It is because that the GPU resources have their own data download methods.

If the tensor has a backing GPU resource, the execution output will be on that backing resource. The await data() call is only when the user wants to download the data from the GPU resource to a CPU buffer.

If the user wants to make a copy from a GPU resource to another GPU resource, they should do it themselves outside of the WebNN call.

I would love not to fragment the MLTensor with a GPU-specific version. It's much simpler to think of an MLTensor as an envelop for a device-specific resource needed at the execution time.

If the tensor has a backing GPU resource, the execution output will be on that backing resource. The await data() call is only when the user wants to download the data from the GPU resource to a CPU buffer.

We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer, WebGPU defines mayAsync and getMappedRange for the data downloading.

My another question is if the tensor is created with an ArrayBufferView, what the MLResource resource() would return? If it returns the ArrayBufferView, what's the semantic difference from Promise<ArrayBufferView> data()?

My another question is if the tensor is created with an ArrayBufferView, what the MLResource resource() would return? If it returns the ArrayBufferView, what's the semantic difference from Promise data()?

resource() returns the backing resource of the MLTensor while await data() asynchronously downloads and layout-converts the content to a standard layout onto a separate buffer. I believe you proposed this semantic for the WASM backend when an operation is executed stand alone.

We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer, WebGPU defines mayAsync and getMappedRange for the data downloading.

We define await data() as a contract to asynchronously download the content to a CPU buffer. How exactly that download is implemented from a GPU resource to a CPU buffer is up to the implementor of this contract.

If the caller of WebNN wants to make a copy from one GPU resource to the other, then that is entirely outside the scope of WebNN.

Let's step back a bit and really think about this requirement for a sec:

The framework may compute multiple ops without accessing the intermediate results. So if we allow passing the output of a compute to the next compute without making the data available, it would help the underlying implementation hide the additional copies and conversions. The idea is illustrated by following pseudo framework integration code:

// assume conv2d_op, add_op, relu_op are compiled single-op graphs // input = tf.tensor(values); // conv2d = tf.conv2d(input, filter); conv2d.mlOutput = conv2d_op.compute({'input': {resource: input.arrayBuffer}).output; // webnn backend // add = tf.add(conv2d, bias); add.mlOutput = add_op.compute({'a': conv2d.mlOutput}).output; // webnn backend // relu = tf.relu(add); relu.mlOutput = relu_op.compute({'input': add.mlOutput}).output; // webnn backend // await relu.data(); await relu.mlOutput.readDataAsync({resource: relu.arrayBuffer}); // webnn backend

i.e. if a framework graph [input]->conv1->conv2->[output] has each of the two conv operations compiled separately on the backend, we want the result out of conv1 to be an opaque handle with implementation-specific layout format that can later be "fixed" in a final async call (e.g. await relu.data() in the pseudo code above).

On a second thought, this requirement can't be allowed. No matter where a convolution operation is defined, whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.

While I understand the motive of potentially reducing copies at the backend, from the API standpoint, it simply cannot be allowed or we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.

This means that for a framework graph [input]->conv1->conv2->[output], if a user (of the framework) executes the entire graph in one atomic action (load-and-run), the framework backend must compile the two convolution ops together in order to achieve the optimization that would allow the intermediate result between the two ops to stay in a native format. And that only the final output after conv2 is converted back to a known layout format. This is precisely the value of graph compilation -- to optimize by hiding the intermediate results within the graph that aren't exposed to the user.

For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way. Yes, it won't be as efficient but that's exactly the trade-off the user is making when using eager execution -- to favor inspection of the output of each operator in the graph over end-to-end graph execution performance.

@wchao1115

whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.

Agree. So the result of MLTensor.data() is always the standard layout format.

we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.

The internal format is up to the implementation, the spec won't define that.

For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way.

The conversion would only be needed when user code calls MLTensor.data(). Otherwise, it can be kept in the internal format and be consumed by the next op compute as an input.

As a summary, if user codes wants to access the output data, call MLTensor.data(). The result is always in the standard layout format.

Agree. So the result of MLTensor.data() is always the standard layout format.

Not only that. The output of MLGraphBuilder.conv2d must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.

The output of MLGraphBuilder.conv2d must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.

Now with MLTensor as the graph output, the user code can only access the results by calling MLTensor.data(). So if MLTensor.data() always returns the standard layout format, it would not break the spec definition.

Did I miss anything?

The output of the conv2d will be in the output buffer that is bound to the graph output in the compute call. Before this PR, there was the MLOutput dictionary where the data field holds the output buffer. In this PR, that would be the resource() call. This is true whether the output buffer is a GPU resource or an ArrayBufferView. I now think it's probably best to leave the async download step up to the caller, and not tying it to the WebNN API since the caller would know how to deal with data downloads from a resource type they specify. As for the operation itself, the output buffer needs to always be in standard formats to keep the spec well defined.

huningxin · 2021-04-15T03:37:11Z

index.bs


 [SecureContext, Exposed=Window]
 interface MLGraph {
-  Promise<MLNamedOutputs> compute(MLNamedInputs inputs, 
-                                  optional MLNamedOutputs outputs = {});
+  MLNamedTensors compute(MLNamedTensors inputs, optional MLNamedTensors outputs = {});


The compute method probably could be overrode for different device types and different output allocations (newly allocated or pre-allocated), e.g.,

typedef record<DOMString, MLTensor> MLNamedTensors; typedef record<DOMString, MLGPUTensor> MLNamedGPUTensors; interface MLGraph { // MLContext with any device types, allocate output `MLTensor`s for listed output names. MLNamedTensors compute(MLNamedTensors inputs, optional sequence<DOMString> outputNames); // MLContext with any device types, use pre-allocated output 'MLTensor`s. void compute(MLNamedTensors inputs, MLNamedTensors outputs); // MLContext with GPUDevice or WebGLRenderingContext device type, allocate output `MLGPUTensor`s for listed output names. MLNamedGPUTensors compute(MLNamedGPUTensors inputs, optional sequence<DOMString> outputNames); // MLContext with GPUDevice or WebGLRenderingContext device type, use pre-allocated output 'MLTensor`s. void compute(MLNamedGPUTensors inputs, MLNamedGPUTensors outputs); };

For the CPU - WebAssembly scenario of Operation-specific API, the framework could create MLTensors with pre-allocated Wasm memory, e.g.,

const inputs = {'x': new MLTensor(input_tensor_info.f32())}; const outputs = {'y': new MLTensor(output_tensor_info.f32_write())}; conv2d.compute(inputs, outputs); await outputs.y.data() // The results have been downloaded into the memory pointed by output_tensor_info.f32_write().

For GPU scenario, the framework could create MLGPUTensors with WebGLTextures, e.g.,

const inputs = {'x': new MLGPUTensor(inputTextureData.texture)}; const outputs = {'y': new MLGPUTensor(outputTextureData.texture)}; conv2d.compute(inputs, outputs); // Post-processing with outputTextureData.texture

@wchao1115 @pyu10055 , does it look good?

@huningxin In WebGL, there is no way to share textures across contexts, is the input textures limited to a GPU context created during compilation or users can specify a context during graph creation?

@pyu10055 the assumption is that the app must create an MLContext with the WebGL context that creates the texture, which later passed back as an input to the graph created by the MLContext. It is generally known that a resource should not be shared across different GPU contexts. In DirectX, you can achieve cross-adapter sharing of resources, but you need to explicitly create a shared resource handle and manage the lifetime of the shared resource explicitly. But at the Web API level, I agree that it is best to make this simpler and simply not allow cross-adapter sharing.

@pyu10055

users can specify a context during graph creation?

yes, user can create a MLContext by specifying a WebGLRenderingContext, as ML.createContext

MLContext createContext(WebGLRenderingContext glContext);

@huningxin I suggest only to have 1 method that requires the caller to pre-allocate the output buffer. It's much simpler that way i.e.

typedef record<DOMString, MLTensor> MLNamedTensors; interface MLGraph { // MLContext with any device types, use pre-allocated output 'MLTensor`s. void compute(MLNamedTensors inputs, MLNamedTensors outputs); };

I wonder how practical the optional output case (Example 5) really is. For a multi-output topology, not requiring all of the outputs to be produced at execution is uncommon. The optimization gained from not having to specify an optional output is going to be minimal relative to the cost of executing the entire topology anyway. We should probably remove this use case and simplify the API.

Also, there is no need to special-case the GPU resources tensor as the MLTensor already wraps those types nicely. See the 'Device Selection` section for more detail as to when different type of resource is allowed.

I wonder how practical the optional output case (Example 5) really is.

The use case came from #105 @pyu10055 .

With the new MLTensor interface, user code can only download the data of a specific output. This would help avoid the overhead of downloading other outputs that user code doesn't need.

Sure. And this would still work, right?

const d = builder.matmul(a, b); const e = builder.add(d, c); const graph = await builder.build({d, e}); const output = {'d': new MLTensor(new Float32Array(sizeOfShape([3, 3])))}; graph.compute(inputs, output); let data = await output.d.data();

pyu10055 · 2021-04-15T14:20:18Z

index.bs

+  // Prefer a compute-specific processor such as a neural engine or NPU.
+  "compute",
+  // Prefer a traditional software-based device such as a CPU.
+  "software"


+1 on cpu and gpu

pyu10055 · 2021-04-15T14:21:22Z

index.bs

+  constructor(MLResource resource, optional sequence<long> dimensions);
+
+  // The tensor dimensions
+  sequence<long> dimensions();


should tensor also have dtype method

+1 to add data type, probably reuse the MLOperandType?

Actually that wouldn't be needed. Remember that MLTensor is just a resource holder type. The data type of the resource is already specified when an operand is created either explicitly through the MLOperandDescriptor or implicitly through the execution of the relevant operation in the graph.

The fact that there is an optional dimensions param in its ctor may have confused the matter, but it's only there to support the graph with free dimensions input where the input shape isn't known until the graph execution time. In a normal use case, it won't be needed, hence optional.

This conversation makes me realize that calling the type MLTensor is itself misleading because when you call something a tensor, you would naturally think that it has both a shape and a data type association. A more appropriate name is probably MLResource or MLGraphResource.

pyu10055 · 2021-04-15T14:35:43Z

explainer.md

-const outputs = await graph.compute(inputs);
+const inputs = {'A': new MLTensor(bufferA), 'B': new MLTensor(bufferB)};
+const outputs = graph.compute(inputs);
+const data = await outputs.C.data();


the data() API is for download to the cpu, correct?

I understand it is, the returned promise resolves with an ArrayBufferView.

anssiko · 2021-04-16T11:01:38Z

index.bs

            <!-- Validate inputs and outputs -->
-            1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop.
+            1. If any of the following requirements are unmet, then stop.


Do we want to signal to the web developer the executing has stopped?

There's a leftover promise reference on L1801 that should be either removed or repurposed for signaling this, this one:

reject promise with an OperationError and stop.

anssiko · 2021-04-16T11:02:23Z

index.bs

                                    1. Remove |outputName| from |remainingOutputNames|.
-                                    1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop.
+                                    1. If |remainingOutputNames| is empty, then stop.


I think L1828 should be updated to either:

Return |results|.

Or remove this step if the preferred design is to use data() for data download per #162 (comment)

anssiko · 2021-04-16T11:10:54Z

index.bs

                            1. Let |i| be 0.
                            1. While true:
-                                1. Let |dimension| be |value|.{{MLInput/dimensions}}[|i|].
+                                1. Let |dimension| be |value|.{{MLTensor/dimensions}}[|i|].
                                1. |dimension| must be greater than 0.
                                1. If |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|] is greater than 0, then |dimension| must be equal to |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|].
                                1. Set |i| to |i| + 1.


As a drive-by comment, many web specs tend to use a more prose-like form:

Increment |i| by 1.

And in many cases if it makes the algorithm more readable, instead of "i" they may say "foobar counter" instead. Not sure if that applies here, just noting in general in web spec algorithms it is recommended to be self-documenting rather than concise. That said, "i" is pretty widely understood :-)

Good suggestion, thanks! Acknowledged, will update the compute algorithm once this PR merged.

anssiko · 2021-04-29T15:19:31Z

From today's call:

RESOLUTION: Open new issues for 1) data UL/DL 2) layout format conversion discussions, and finish PR #162 addressing device selection

This reverts commit f9f02f4.

…s and constraints. Introduce MLTensor that allows asynchronous readback." This reverts commit 6e137e4.

wchao1115 · 2021-05-01T21:37:41Z

Based on the resolution in the CG call, I've pulled back on the asynchronous data download call and only left the change with the addition of the device selection note.

Also, based on our discussion, we will leave the accelerator device preference option out of the spec for now since the resource models around different kinds of accelerators is still not well defined. We'll deal with these issues separately from this PR.

Please have a look.

huningxin

Thanks @wchao1115 !

huningxin · 2021-05-02T02:19:14Z

index.bs

+
+An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An {{MLContext}} could be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph.
+
+In a situation when a GPU context executes a graph with constants or inputs given as {{ArrayBufferView}}, the content of such constant or input is automatically uploaded from the system memory to the GPU memory. Likewise, the content of the execution result is downloaded to the system memory as {{ArrayBufferView}} upon requested.


Should we info the two aspects of a CPU context when executing a graph with ArrayBufferView:

no data moving across device

may still have layout format conversions

Updated. Please have a look.

huningxin

Looks good. Thanks much @wchao1115 !

wchao1115 added 2 commits April 12, 2021 22:27

Support device selection. Explain device resource expectations and co…

6e137e4

…nstraints. Introduce MLTensor that allows asynchronous readback.

Wording update.

f9f02f4

wchao1115 requested review from pyu10055, anssiko and huningxin April 13, 2021 06:10

huningxin reviewed Apr 14, 2021

View reviewed changes

huningxin reviewed Apr 15, 2021

View reviewed changes

pyu10055 reviewed Apr 15, 2021

View reviewed changes

anssiko requested changes Apr 16, 2021

View reviewed changes

anssiko reviewed Apr 16, 2021

View reviewed changes

wchao1115 added 4 commits May 1, 2021 13:29

Revert "Wording update."

6c7a9e4

This reverts commit f9f02f4.

Revert "Support device selection. Explain device resource expectation…

81161eb

…s and constraints. Introduce MLTensor that allows asynchronous readback." This reverts commit 6e137e4.

Device preference and device selection logic.

59edff8

Merge branch 'master' into wchao/interop_efficiency

055f921

wchao1115 changed the title ~~Add support for device selection and asynchronous data downloads for GPU readbacks.~~ Add support for device selection May 1, 2021

huningxin reviewed May 2, 2021

View reviewed changes

anssiko self-requested a review May 3, 2021 09:41

anssiko approved these changes May 3, 2021

View reviewed changes

Wording updates.

806cd20

huningxin approved these changes May 6, 2021

View reviewed changes

wchao1115 merged commit 0c847fb into master May 6, 2021

wchao1115 deleted the wchao/interop_efficiency branch May 6, 2021 00:24

huningxin mentioned this pull request May 7, 2021

Support download data asynchronously #166

Closed

anssiko mentioned this pull request May 13, 2021

Device selection with MLDevicePreference and MLPowerPreference #169

Closed

huningxin mentioned this pull request Jan 13, 2022

Investigation: how WebNN / WebGPU interop could be happening gpuweb/gpuweb#2500

Open

This was referenced Nov 29, 2022

API review, questions, brainstorming #298

Closed

API simplification: context types, context options, createContext() #302

Open


		An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An {{MLContext}} could be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph.

		In a situation when a GPU context executes a graph with constants or inputs given as {{ArrayBufferView}}, the content of such constant or input is automatically uploaded from the system memory to the GPU memory. Likewise, the content of the execution result is downloaded to the system memory as {{ArrayBufferView}} upon requested.

Add support for device selection #162

Add support for device selection #162

Conversation

wchao1115 commented Apr 13, 2021 • edited by pr-preview bot Loading

huningxin Apr 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Apr 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin Apr 14, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

huningxin Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

huningxin Apr 15, 2021 • edited Loading

Choose a reason for hiding this comment

wchao1115 Apr 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Apr 28, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

wchao1115 Apr 25, 2021 • edited Loading

Choose a reason for hiding this comment

huningxin Apr 26, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anssiko commented Apr 29, 2021

wchao1115 commented May 1, 2021

huningxin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

huningxin left a comment

Choose a reason for hiding this comment

wchao1115 commented Apr 13, 2021 •

edited by pr-preview bot

Loading

huningxin Apr 14, 2021 •

edited

Loading

wchao1115 Apr 14, 2021 •

edited

Loading

huningxin Apr 14, 2021 •

edited

Loading

wchao1115 Apr 26, 2021 •

edited

Loading

huningxin Apr 26, 2021 •

edited

Loading

huningxin Apr 15, 2021 •

edited

Loading

wchao1115 Apr 25, 2021 •

edited

Loading

wchao1115 Apr 26, 2021 •

edited

Loading

wchao1115 Apr 28, 2021 •

edited

Loading

wchao1115 Apr 25, 2021 •

edited

Loading

huningxin Apr 26, 2021 •

edited

Loading