Support CPU - WebAssembly scenario of the op level execution use case #156

huningxin · 2021-03-22T08:52:51Z

Open this issue to follow up the operation-specific APIs discussion of 3/18 WebML CG call. @pyu10055 @wchao1115 @anssiko @jbingham, please take a look.

Use case

This is one scenario of the framework's op level execution use case (more details can be found in operation-specific API proposal). A JavaScript ML framework executes ops on the CPU device with WebAssembly. For the compute intensive ops, such as conv2d or matmul, the framework also wants to use WebNN API to execute the op (by a single-op MLGraph) with the ML-specific instructions, such as Vector Neural Network Instructions (VNNI), on the same CPU device.

Requirements

WebNN should allow frameworks create a MLContex for CPU device. This would avoid the unnecessary data copying cross devices when frameworks use WebAssembly - CPU to execute other ops.

WebNN should allow frameworks control when the output data is available for access. This would avoid the unnecessary tensor layout conversions between native ML API and the WebNN. Some background:

Some native ML APIs use hardware dependent memory layout for acceleration, for example oneDNN uses different blocked memory layouts for better vectorization and cache reuse on different platforms.
The memory layout conversions are expensive.
Frameworks may use WebNN API to execute multiple ops (via multiple single-op MLGraphs) without access the intermediate results between them.

For example, a user of TensorFlow.js may execute 3 conv2d but only access the output of the last one:

c = tf.conv2d(a, b);
e = tf.conv2d(c, d);
h = tf.conv2d(f, g);
output = await h.data();

A potential WebNN implementation would only need to do the memory layout conversion and put the data into ArrayBufferView when h.data() is invoked.

The text was updated successfully, but these errors were encountered:

anssiko · 2021-03-30T12:14:28Z

(Cross-linking @wchao1115’s comment on WebAssembly.Memory object #149 (comment).)

huningxin · 2021-05-24T07:24:03Z

In order to better understand this use case, recently I happened to experimentally implement conv2d of TF.js Wasm backend by WebNN API. The implementation is in conv2d_impl.cc and the WebNN calls are guarded by USE_WEBNN_OP. With the prototype, I observed good performance speedup (3X to 5X) by a tf.conv2d benchmark when offloading the compute to native library (such as XNNPACK or oneDNN) via WebNN running on CPU.

According to the prototype, there are some findings:

TF.js Wasm backend expects the input and output data of an op execution to be in standard layout.
TF.js Wasm backend pre-allocates input and output buffers for an op execution.
TF.js Wasm backend executes an op synchronously.

@pyu10055

pyu10055 · 2021-05-27T14:07:19Z

@huningxin That is correct, TFJS wasm backend is synchronous, the computational heavy ops are executed with webworkers for multi-threading. And TFJS can be ran with a webworker to achieve asynchronous.

huningxin · 2021-06-03T01:01:26Z

@pyu10055 , thanks for the clarification.

BTW, the webnn-native code to reproduce the conv2d perf of #156 (comment) is in webmachinelearning/webnn-native#10 for review. Feel free to check it out.

anssiko · 2023-03-03T07:12:17Z

While doing issue gardening, noticed this issue had been fixed by #174.

huningxin mentioned this issue Mar 30, 2021

WebGL and WebGPU interops #149

Merged

wchao1115 mentioned this issue Apr 13, 2021

Add support for device selection #162

Merged

huningxin mentioned this issue May 7, 2021

Support download data asynchronously #166

Closed

This was referenced May 31, 2021

Support sync API and require pre-allocated output buffers #174

Merged

Add oneDNN/XNNPACK backend and support MobileNet/SqueezeNet model webmachinelearning/webnn-native#10

Merged

anssiko mentioned this issue Jun 17, 2021

Device selection with MLDevicePreference and MLPowerPreference #169

Closed

anssiko closed this as completed Mar 3, 2023

huningxin mentioned this issue Jun 5, 2024

WebNN should support NPU and QDQ operations #623

Open

anssiko added the device selection label Jun 27, 2024

anssiko mentioned this issue Jun 27, 2024

Process: Add "device selection" label #712

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support CPU - WebAssembly scenario of the op level execution use case #156

Support CPU - WebAssembly scenario of the op level execution use case #156

huningxin commented Mar 22, 2021

anssiko commented Mar 30, 2021

huningxin commented May 24, 2021 •

edited

Loading

pyu10055 commented May 27, 2021

huningxin commented Jun 3, 2021

anssiko commented Mar 3, 2023

Support CPU - WebAssembly scenario of the op level execution use case #156

Support CPU - WebAssembly scenario of the op level execution use case #156

Comments

huningxin commented Mar 22, 2021

Use case

Requirements

anssiko commented Mar 30, 2021

huningxin commented May 24, 2021 • edited Loading

pyu10055 commented May 27, 2021

huningxin commented Jun 3, 2021

anssiko commented Mar 3, 2023

huningxin commented May 24, 2021 •

edited

Loading