Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for device selection #162

Merged
merged 7 commits into from
May 6, 2021
Merged

Conversation

wchao1115
Copy link
Collaborator

@wchao1115 wchao1115 commented Apr 13, 2021

  • Add a device preference in the context option in addition to the power preference. This change allows the app to have more control over what type of execution device should be used. It enables an effective interop with WASM (Support CPU - WebAssembly scenario of the op level execution use case #156).
  • Follow up on the issue discussed in PR#149 around supporting operation-level scenario in frameworks that support asynchronous data downloads.
  • Further clarify level of support for GPU resources from WebGPU and WebGL in relation to the underlying execution device the MLContext is based on.

@RafaelCintron


Preview | Diff

…nstraints. Introduce MLTensor that allows asynchronous readback.
index.bs Outdated
// Prefer a compute-specific processor such as a neural engine or NPU.
"compute",
// Prefer a traditional software-based device such as a CPU.
"software"
Copy link
Contributor

@huningxin huningxin Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to naming it as cpu. It would be a clear device name that the Wasm libs could select to avoid the unnecessary data copying cross devices, like the Support CPU - WebAssembly scenario of the op level execution use case describes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the API name prefix, most API naming conventions generally discourage the use of abbreviations as identifier names. If we were to call it "cpu" we would also need a "gpu" value to be consistent, and then what's for AI accelerators? "npu", "vpu", "tpu"? I think that would start a naming discussion to no end.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose the "cpu" and "gpu" abbreviations would be well known by developers because they are already widely used by native ML APIs, e.g. NNAPI DeviceTypeCode defines ANEURALNETWORKS_DEVICE_CPU and ANEURALNETWORKS_DEVICE_GPU, ML Compute MLCDeviceType defines gpu and cpu and oneDNN dnnl_engine_kind_t also defines dnnl_cpu and dnnl_gpu.

For AI accelerators, probably we can just call them as "accelerator" (in the context of WebNN). This is also used by native APIs, such as NNAPI DeviceTypeCode defines ANEURALNETWORKS_DEVICE_ACCELERATOR for "Dedicated accelerator for Machine Learning workloads.". Another example is OpenCL cl_device_type defines CL_​DEVICE_​TYPE_​CPU, CL_DEVICE_TYPE_GPU and CL_​DEVICE_​TYPE_​ACCELERATOR for heterogeneous hardware.

Actually, today's modern CPUs already have ML specific instructions, such as Vector Neural Network Instruction (VNNI) of X86 and Scalable Vector Extension 2 (SVE2) of Arm, and are adding even more advanced instructions, e.g., for matrix multiplication. My major concern is "software" seems not to reflect these new capabilities that would accelerate the ML workloads for web apps.

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open to naming suggestions if others have input on this. In my line of work designing Windows API at Microsoft, we generally stay away from using abbreviations because its meaning and perceived freshness tends to change over time. It is also harder for people with different background to understand. (Windows API is well known for longevity).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My 2 cents worth: "compute" and "software" sound too generic, and I wouldn't be able to guess what they meant. On the other hand, I do think most people can guess what "cpu" or "gpu" mean. I agree custom neural accelerators don't have as obvious a choice. "npu" or "accelerator" or "custom" sound fine to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on cpu and gpu

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I couldn't find any reference or recommendations regarding usage of abbreviations in the Web API definition. Anyone knows where to look? @anssiko. If it is allowed, should it be used in lowercase or uppercase? etc.

DirectX API in general avoids abbreviations, and terms like "graphics" or even "software" have been used for quite some time. e.g.

https://docs.microsoft.com/en-us/windows/win32/direct3d11/atoc-dx-graphics-direct3d-11
https://docs.microsoft.com/en-us/windows/win32/api/d3dcommon/ne-d3dcommon-d3d_driver_type

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115 Web Platform Design Principles > Naming principles https://w3ctag.github.io/design-principles/#naming-is-hard is probably the best resource. Naming is always a topic of active discussion in API design, and there's often no absolute rights or wrongs.

Given there's a Web API called WebGPU, I suspect "gpu" would pass if you prefer it that way. Similarly, we can probably agree that "cpu" is familar to web developers of today.

That said, I'd challenge the group to think this from the perspective of future-proofing. How do the possible future architectures fit into the device buckets we choose? We could make this extensible with new buckets in the future, or we could make the initial ones abstract enough to accommodate future devices. I think there's always a trade-off whatever we choose. What I think we want to achieve is to be able to provide web developers with knobs they can use to give hints to the implementation so their workloads can be executed in a manner that yields the best user experience.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @anssiko. I'll take this advice in the design principles: Please consult widely on names in your APIs.

The hardest is probably the accelerator type. Let's discuss this naming in our next CG call.

index.bs Outdated
console.log(`values: ${outputs.c.data}`);
const outputs = {'c': new MLTensor(new Float32Array(sizeOfShape([3, 3])))};
graph.compute(inputs, outputs);
const data = await outputs.c.data();
Copy link
Contributor

@huningxin huningxin Apr 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am trying to figure out the right behavior of MLTensor for the pre-allocated outputs scenario. Will outputs.c.data() allocate and turn a new ArrayBufferView instead of downloading the result into the pre-allocated ArrayBufferView? And if user code captures the returned MLTensor, what's the relationship between the returned MLTensor and the given MLTensor? e.g.,

const buffer = new Float32Array(sizeOfShape([3, 3]);
const outputs = {'c': new MLTensor(buffer))};
let returns = graph.compute(inputs, outputs);
const buffer1 = await returns.c.data();
const buffer2 = await outputs.c.data();
// is buffer === buffer1 === buffer2?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the outputs are given, I would suggest compute not to return the results. The calling data() of the given output MLTensor should download the results to the pre-allocated ArrayBufferView and return it (the same object). e.g.,

const preallocatedBuffer = new Float32Array(sizeOfShape([3, 3]);
const outputs = {'c': new MLTensor(preallocatedBuffer))};
graph.compute(inputs, outputs); // compute does not return
const returnedBuffer = await outputs.c.data();
// returnedBuffer === preallocatedBuffer 

index.bs Outdated
console.log(`outputs include ${Object.keys(outputs)}`);

// Compute d.
outputs = await graph.compute(inputs, {d});
outputs = graph.compute(inputs, {d});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the optional output scenario, user code only needs to specify the required output names. Previous MLOutput has all optional fields, so it can be an empty dictionary and not bound to any resources. However with the new MLTensor, it looks like user code has to create from a resource which is not necessary for this usage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in this case {d} is a dictionary with a single key associated with a null (MLTensor) value. Am I missing something here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may make mistake for record<DOMString, MLOutput> case, it should be {'d': {}}. However, for record<DOMString, MLTensor>, I am afraid the {'d': null} is invalid.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As the outputs will be returned by compute, probably we can optimize this usage by allowing user to only specify the output names in an array. e.g.,

outputs = graph.compute(inputs, ['d']);

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Syntax-wise, are you saying that the following is also invalid?

const d = builder.matmul(a, b);
const e = builder.add(d, c);
--> (?) const graph = await builder.build({d, e});

Can you tell me exactly how record<K,V> should be used and how should we write the above line in question? I've never found a usage sample for it in the WebIDL spec.

Is it?
await builder.build({'d': d, 'e': e});

Copy link
Contributor

@huningxin huningxin Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

--> (?) const graph = await builder.build({d, e});

This code is valid. It is equivalent to await builder.build({'d': d, 'e': e});

They are two styles of object properties definition. The former one is so called "shorthand property names" of ES2015.

I said outputs = graph.compute(inputs, {d}); is invalid, because the object d is a type of Operand in that example. If d is a type of MLOutput, it would be valid.

index.bs Outdated
MLResource resource();

// Asynchronously download the result of a computation onto a typed array.
Promise<ArrayBufferView> data();
Copy link
Contributor

@huningxin huningxin Apr 15, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MLTensor that downloads the result to ArrayBufferView probably needs to be separated from ones for GPU resources (say MLGPUTensor). It is because that the GPU resources have their own data download methods. e.g.,

interface MLTensor {
  constructor(ArrayBufferView buffer, optional sequence<long> dimensions);
  sequence<long> dimensions();
  Promise<ArrayBufferView> data();
};

typedef (MLResourceBufferView or WebGLTexture or GPUTexture) MLGPUResource;

interface MLGPUTensor {
  constructor(MLGPUResource resource, optional sequence<long> dimensions);
  sequence<long> dimensions();
  MLGPUResource resource();
};

According to the table of device types and the resource types:

  • MLContext created with any device types can accept and return MLTensor,
  • MLContext created with GPU device can accept and return MLTensor, and MLGPUTensor for gpu interop.

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably needs to be separated from ones for GPU resources (say MLGPUTensor). It is because that the GPU resources have their own data download methods.

If the tensor has a backing GPU resource, the execution output will be on that backing resource. The await data() call is only when the user wants to download the data from the GPU resource to a CPU buffer.

If the user wants to make a copy from a GPU resource to another GPU resource, they should do it themselves outside of the WebNN call.

I would love not to fragment the MLTensor with a GPU-specific version. It's much simpler to think of an MLTensor as an envelop for a device-specific resource needed at the execution time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the tensor has a backing GPU resource, the execution output will be on that backing resource. The await data() call is only when the user wants to download the data from the GPU resource to a CPU buffer.

We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer, WebGPU defines mayAsync and getMappedRange for the data downloading.

My another question is if the tensor is created with an ArrayBufferView, what the MLResource resource() would return? If it returns the ArrayBufferView, what's the semantic difference from Promise<ArrayBufferView> data()?

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My another question is if the tensor is created with an ArrayBufferView, what the MLResource resource() would return? If it returns the ArrayBufferView, what's the semantic difference from Promise data()?

resource() returns the backing resource of the MLTensor while await data() asynchronously downloads and layout-converts the content to a standard layout onto a separate buffer. I believe you proposed this semantic for the WASM backend when an operation is executed stand alone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably need to coordinate with other Web API, e.g. if the resource is a GPUBuffer, WebGPU defines mayAsync and getMappedRange for the data downloading.

We define await data() as a contract to asynchronously download the content to a CPU buffer. How exactly that download is implemented from a GPU resource to a CPU buffer is up to the implementor of this contract.

If the caller of WebNN wants to make a copy from one GPU resource to the other, then that is entirely outside the scope of WebNN.

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's step back a bit and really think about this requirement for a sec:

The framework may compute multiple ops without accessing the intermediate results. So if we allow passing the output of a compute to the next compute without making the data available, it would help the underlying implementation hide the additional copies and conversions. The idea is illustrated by following pseudo framework integration code:

// assume conv2d_op, add_op, relu_op are compiled single-op graphs
// input = tf.tensor(values);
// conv2d = tf.conv2d(input, filter);
conv2d.mlOutput = conv2d_op.compute({'input': {resource: input.arrayBuffer}).output; // webnn backend
// add = tf.add(conv2d, bias);
add.mlOutput = add_op.compute({'a': conv2d.mlOutput}).output; // webnn backend
// relu = tf.relu(add);
relu.mlOutput = relu_op.compute({'input': add.mlOutput}).output;  // webnn backend
// await relu.data();
await relu.mlOutput.readDataAsync({resource: relu.arrayBuffer});  // webnn backend

i.e. if a framework graph [input]->conv1->conv2->[output] has each of the two conv operations compiled separately on the backend, we want the result out of conv1 to be an opaque handle with implementation-specific layout format that can later be "fixed" in a final async call (e.g. await relu.data() in the pseudo code above).

On a second thought, this requirement can't be allowed. No matter where a convolution operation is defined, whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.

While I understand the motive of potentially reducing copies at the backend, from the API standpoint, it simply cannot be allowed or we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.

This means that for a framework graph [input]->conv1->conv2->[output], if a user (of the framework) executes the entire graph in one atomic action (load-and-run), the framework backend must compile the two convolution ops together in order to achieve the optimization that would allow the intermediate result between the two ops to stay in a native format. And that only the final output after conv2 is converted back to a known layout format. This is precisely the value of graph compilation -- to optimize by hiding the intermediate results within the graph that aren't exposed to the user.

For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way. Yes, it won't be as efficient but that's exactly the trade-off the user is making when using eager execution -- to favor inspection of the output of each operator in the graph over end-to-end graph execution performance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wchao1115

whether at the framework level or at webnn backend, an output of a public operation must be in a known layout format.

Agree. So the result of MLTensor.data() is always the standard layout format.

we'll end up with a spec that says that the layout of an output of an operation is in an unknown hardware-specific format.

The internal format is up to the implementation, the spec won't define that.

For the case where the framework graph is actually exposed to the user and that the user chooses to execute each op in the graph individually (eager execution), the execution result from each op must be produced and converted into a known layout format every step of the way.

The conversion would only be needed when user code calls MLTensor.data(). Otherwise, it can be kept in the internal format and be consumed by the next op compute as an input.

As a summary, if user codes wants to access the output data, call MLTensor.data(). The result is always in the standard layout format.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree. So the result of MLTensor.data() is always the standard layout format.

Not only that. The output of MLGraphBuilder.conv2d must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of MLGraphBuilder.conv2d must also be in standard format. See detail in the conv2d API spec, both the size and the shape of the output operand are clearly defined, because this operation might be the last in the graph, and so its output in that case is the graph's output, which must be defined.

Now with MLTensor as the graph output, the user code can only access the results by calling MLTensor.data(). So if MLTensor.data() always returns the standard layout format, it would not break the spec definition.

Did I miss anything?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output of the conv2d will be in the output buffer that is bound to the graph output in the compute call. Before this PR, there was the MLOutput dictionary where the data field holds the output buffer. In this PR, that would be the resource() call. This is true whether the output buffer is a GPU resource or an ArrayBufferView. I now think it's probably best to leave the async download step up to the caller, and not tying it to the WebNN API since the caller would know how to deal with data downloads from a resource type they specify. As for the operation itself, the output buffer needs to always be in standard formats to keep the spec well defined.

index.bs Outdated

[SecureContext, Exposed=Window]
interface MLGraph {
Promise<MLNamedOutputs> compute(MLNamedInputs inputs,
optional MLNamedOutputs outputs = {});
MLNamedTensors compute(MLNamedTensors inputs, optional MLNamedTensors outputs = {});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compute method probably could be overrode for different device types and different output allocations (newly allocated or pre-allocated), e.g.,

typedef record<DOMString, MLTensor> MLNamedTensors;
typedef record<DOMString, MLGPUTensor> MLNamedGPUTensors;

interface MLGraph {
  // MLContext with any device types, allocate output `MLTensor`s for listed output names.
  MLNamedTensors compute(MLNamedTensors inputs, optional sequence<DOMString> outputNames);

  // MLContext with any device types, use pre-allocated output 'MLTensor`s.
  void compute(MLNamedTensors inputs, MLNamedTensors outputs);

  // MLContext with GPUDevice or WebGLRenderingContext device type, allocate output `MLGPUTensor`s for listed output names.
  MLNamedGPUTensors compute(MLNamedGPUTensors inputs, optional sequence<DOMString> outputNames);

  // MLContext with GPUDevice or WebGLRenderingContext device type, use pre-allocated output 'MLTensor`s.
  void compute(MLNamedGPUTensors inputs, MLNamedGPUTensors outputs);
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the CPU - WebAssembly scenario of Operation-specific API, the framework could create MLTensors with pre-allocated Wasm memory, e.g.,

const inputs = {'x': new MLTensor(input_tensor_info.f32())};
const outputs = {'y': new MLTensor(output_tensor_info.f32_write())};
conv2d.compute(inputs, outputs);
await outputs.y.data()
// The results have been downloaded into the memory pointed by output_tensor_info.f32_write().

For GPU scenario, the framework could create MLGPUTensors with WebGLTextures, e.g.,

const inputs = {'x': new MLGPUTensor(inputTextureData.texture)};
const outputs = {'y': new MLGPUTensor(outputTextureData.texture)};
conv2d.compute(inputs, outputs);
// Post-processing with outputTextureData.texture

@wchao1115 @pyu10055 , does it look good?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huningxin In WebGL, there is no way to share textures across contexts, is the input textures limited to a GPU context created during compilation or users can specify a context during graph creation?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055 the assumption is that the app must create an MLContext with the WebGL context that creates the texture, which later passed back as an input to the graph created by the MLContext. It is generally known that a resource should not be shared across different GPU contexts. In DirectX, you can achieve cross-adapter sharing of resources, but you need to explicitly create a shared resource handle and manage the lifetime of the shared resource explicitly. But at the Web API level, I agree that it is best to make this simpler and simply not allow cross-adapter sharing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pyu10055

users can specify a context during graph creation?

yes, user can create a MLContext by specifying a WebGLRenderingContext, as ML.createContext

MLContext createContext(WebGLRenderingContext glContext);

Copy link
Collaborator Author

@wchao1115 wchao1115 Apr 25, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huningxin I suggest only to have 1 method that requires the caller to pre-allocate the output buffer. It's much simpler that way i.e.

typedef record<DOMString, MLTensor> MLNamedTensors;

interface MLGraph {
  // MLContext with any device types, use pre-allocated output 'MLTensor`s.
  void compute(MLNamedTensors inputs, MLNamedTensors outputs);
};

I wonder how practical the optional output case (Example 5) really is. For a multi-output topology, not requiring all of the outputs to be produced at execution is uncommon. The optimization gained from not having to specify an optional output is going to be minimal relative to the cost of executing the entire topology anyway. We should probably remove this use case and simplify the API.

Also, there is no need to special-case the GPU resources tensor as the MLTensor already wraps those types nicely. See the 'Device Selection` section for more detail as to when different type of resource is allowed.

Copy link
Contributor

@huningxin huningxin Apr 26, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder how practical the optional output case (Example 5) really is.

The use case came from #105 @pyu10055 .

With the new MLTensor interface, user code can only download the data of a specific output. This would help avoid the overhead of downloading other outputs that user code doesn't need.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. And this would still work, right?

const d = builder.matmul(a, b);
const e = builder.add(d, c);
const graph = await builder.build({d, e});
const output = {'d': new MLTensor(new Float32Array(sizeOfShape([3, 3])))};
graph.compute(inputs, output);
let data = await output.d.data();

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right.

index.bs Outdated
// Prefer a compute-specific processor such as a neural engine or NPU.
"compute",
// Prefer a traditional software-based device such as a CPU.
"software"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 on cpu and gpu

index.bs Outdated
constructor(MLResource resource, optional sequence<long> dimensions);

// The tensor dimensions
sequence<long> dimensions();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should tensor also have dtype method

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to add data type, probably reuse the MLOperandType?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually that wouldn't be needed. Remember that MLTensor is just a resource holder type. The data type of the resource is already specified when an operand is created either explicitly through the MLOperandDescriptor or implicitly through the execution of the relevant operation in the graph.

The fact that there is an optional dimensions param in its ctor may have confused the matter, but it's only there to support the graph with free dimensions input where the input shape isn't known until the graph execution time. In a normal use case, it won't be needed, hence optional.

This conversation makes me realize that calling the type MLTensor is itself misleading because when you call something a tensor, you would naturally think that it has both a shape and a data type association. A more appropriate name is probably MLResource or MLGraphResource.

explainer.md Outdated
const outputs = await graph.compute(inputs);
const inputs = {'A': new MLTensor(bufferA), 'B': new MLTensor(bufferB)};
const outputs = graph.compute(inputs);
const data = await outputs.C.data();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the data() API is for download to the cpu, correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand it is, the returned promise resolves with an ArrayBufferView.

index.bs Outdated
<!-- Validate inputs and outputs -->
1. If any of the following requirements are unmet, then [=reject=] |promise| with a {{TypeError}} and stop.
1. If any of the following requirements are unmet, then stop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to signal to the web developer the executing has stopped?

There's a leftover promise reference on L1801 that should be either removed or repurposed for signaling this, this one:

reject promise with an OperationError and stop.

index.bs Outdated
1. Remove |outputName| from |remainingOutputNames|.
1. If |remainingOutputNames| is empty, then resolve |promise| with |results| and stop.
1. If |remainingOutputNames| is empty, then stop.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think L1828 should be updated to either:

  1. Return |results|.

Or remove this step if the preferred design is to use data() for data download per #162 (comment)

index.bs Outdated
1. Let |i| be 0.
1. While true:
1. Let |dimension| be |value|.{{MLInput/dimensions}}[|i|].
1. Let |dimension| be |value|.{{MLTensor/dimensions}}[|i|].
1. |dimension| must be greater than 0.
1. If |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|] is greater than 0, then |dimension| must be equal to |inputOperand|.{{MLOperandDescriptor/dimensions}}[|i|].
1. Set |i| to |i| + 1.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a drive-by comment, many web specs tend to use a more prose-like form:

Increment |i| by 1.

And in many cases if it makes the algorithm more readable, instead of "i" they may say "foobar counter" instead. Not sure if that applies here, just noting in general in web spec algorithms it is recommended to be self-documenting rather than concise. That said, "i" is pretty widely understood :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion, thanks! Acknowledged, will update the compute algorithm once this PR merged.

@anssiko
Copy link
Member

anssiko commented Apr 29, 2021

From today's call:

RESOLUTION: Open new issues for 1) data UL/DL 2) layout format conversion discussions, and finish PR #162 addressing device selection

@wchao1115
Copy link
Collaborator Author

Based on the resolution in the CG call, I've pulled back on the asynchronous data download call and only left the change with the addition of the device selection note.

Also, based on our discussion, we will leave the accelerator device preference option out of the spec for now since the resource models around different kinds of accelerators is still not well defined. We'll deal with these issues separately from this PR.

Please have a look.

@wchao1115 wchao1115 changed the title Add support for device selection and asynchronous data downloads for GPU readbacks. Add support for device selection May 1, 2021
Copy link
Contributor

@huningxin huningxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @wchao1115 !

index.bs Outdated

An {{MLContext}} interface represents a global state of neural network execution. One of the important context states is the underlying execution device that manages the resources and facilitates the compilation and the eventual execution of the neural network graph. An {{MLContext}} could be created from a specific GPU device such as {{GPUDevice}} or {{WebGLRenderingContext}} that is already in use by the application, in which case the corresponding {{GPUBuffer}} or {{WebGLBuffer}} resources used as graph constants, as well as the {{GPUTexture}} and {{WebGLTexture}} as graph inputs must also be created from the same device. In a multi-adapter configuration, the device used for {{MLContext}} must be created from the same adapter as the device used to allocate the resources referenced in the graph.

In a situation when a GPU context executes a graph with constants or inputs given as {{ArrayBufferView}}, the content of such constant or input is automatically uploaded from the system memory to the GPU memory. Likewise, the content of the execution result is downloaded to the system memory as {{ArrayBufferView}} upon requested.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we info the two aspects of a CPU context when executing a graph with ArrayBufferView:

  1. no data moving across device
  2. may still have layout format conversions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated. Please have a look.

@anssiko anssiko self-requested a review May 3, 2021 09:41
Copy link
Contributor

@huningxin huningxin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks much @wchao1115 !

@wchao1115 wchao1115 merged commit 0c847fb into master May 6, 2021
@wchao1115 wchao1115 deleted the wchao/interop_efficiency branch May 6, 2021 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants