Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add docs on Module extension. #3798

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/source/build-run-coreml.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ libsqlite3.tbd
```
5. Add the exported program to the [Copy Bundle Phase](https://developer.apple.com/documentation/xcode/customizing-the-build-phases-of-a-target#Copy-files-to-the-finished-product) of your Xcode target.

6. Please follow the [running a model](./running-a-model-cpp-tutorial.md) tutorial to integrate the code for loading an ExecuTorch program.
6. Please follow the [Runtime APIs Tutorial](extension-module.md) to integrate the code for loading an ExecuTorch program.

7. Update the code to load the program from the Application's bundle.
``` objective-c
Expand Down
2 changes: 1 addition & 1 deletion docs/source/executorch-runtime-api-reference.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ ExecuTorch Runtime API Reference
The ExecuTorch C++ API provides an on-device execution framework for exported PyTorch models.

For a tutorial style introduction to the runtime API, check out the
`runtime api tutorial <running-a-model-cpp-tutorial.html>`__.
`runtime tutorial <running-a-model-cpp-tutorial.html>`__ and its `simplified <extension-module.html>`__ version.

Model Loading and Execution
---------------------------
Expand Down
155 changes: 155 additions & 0 deletions docs/source/extension-module.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,155 @@
# Running an ExecuTorch Model Using the Module Extension in C++

**Author:** [Anthony Shoumikhin](https://github.com/shoumikhin)

In the [Running an ExecuTorch Model in C++ Tutorial](running-a-model-cpp-tutorial.md), we explored the lower-level ExecuTorch APIs for running an exported model. While these APIs offer zero overhead, great flexibility, and control, they can be verbose and complex for regular use. To simplify this and resemble PyTorch's eager mode in Python, we introduce the Module facade APIs over the regular ExecuTorch runtime APIs. The Module APIs provide the same flexibility but default to commonly used components like `DataLoader` and `MemoryAllocator`, hiding most intricate details.

## Example

Let's see how we can run the `SimpleConv` model generated from the [Exporting to ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial) using the `Module` APIs:

```cpp
#include <executorch/extension/module/module.h>

using namespace ::torch::executor;

// Create a Module.
Module module("/path/to/model.pte");

// Wrap the input data with a Tensor.
float input[1 * 3 * 256 * 256];
Tensor::SizesType sizes[] = {1, 3, 256, 256};
TensorImpl tensor(ScalarType::Float, std::size(sizes), sizes, input);

// Perform an inference.
const auto result = module.forward({EValue(Tensor(&tensor))});

// Check for success or failure.
if (result.ok()) {
// Retrieve the output data.
const auto output = result->at(0).toTensor().const_data_ptr<float>();
}
```

The code now boils down to creating a `Module` and calling `forward()` on it, with no additional setup. Let's take a closer look at these and other `Module` APIs to better understand the internal workings.

## APIs

### Creating a Module

Creating a `Module` object is an extremely fast operation that does not involve significant processing time or memory allocation. The actual loading of a `Program` and a `Method` happens lazily on the first inference unless explicitly requested with a dedicated API.

```cpp
Module module("/path/to/model.pte");
```

### Force-Loading a Method

To force-load the `Module` (and thus the underlying ExecuTorch `Program`) at any time, use the `load()` function:

```cpp
const auto error = module.load();

assert(module.is_loaded());
```

To force-load a particular `Method`, call the `load_method()` function:

```cpp
const auto error = module.load_method("forward");

assert(module.is_method_loaded("forward"));
```
Note: the `Program` is loaded automatically before any `Method` is loaded. Subsequent attemps to load them have no effect if one of the previous attemps was successful.

### Querying for Metadata

Get a set of method names that a Module contains udsing the `method_names()` function:

```cpp
const auto method_names = module.method_names();

if (method_names.ok()) {
assert(method_names.count("forward"));
}
```

Note: `method_names()` will try to force-load the `Program` when called the first time.

Introspect miscellaneous metadata about a particular method via `MethodMeta` struct returned by `method_meta()` function:

```cpp
const auto method_meta = module.method_meta("forward");

if (method_meta.ok()) {
assert(method_meta->name() == "forward");
assert(method_meta->num_inputs() > 1);

const auto input_meta = method_meta->input_tensor_meta(0);

if (input_meta.ok()) {
assert(input_meta->scalar_type() == ScalarType::Float);
}
const auto output_meta = meta->output_tensor_meta(0);

if (output_meta.ok()) {
assert(output_meta->sizes().size() == 1);
}
}
```

Note: `method_meta()` will try to force-load the `Method` when called for the first time.

### Perform an Inference

Assuming that the `Program`'s method names and their input format is known ahead of time, we rarely need to query for those and can run the methods directly by name using the `execute()` function:

```cpp
const auto result = module.execute("forward", {EValue(Tensor(&tensor))});
```

Which can also be simplified for the standard `forward()` method name as:

```cpp
const auto result = module.forward({EValue(Tensor(&tensor))});
```

Note: `execute()` or `forward()` will try to force load the `Program` and the `Method` when called for the first time. Therefore, the first inference will take more time than subsequent ones as it loads the model lazily and prepares it for execution unless the `Program` or `Method` was loaded explicitly earlier using the corresponding functions.

### Result and Error Types

Most of the ExecuTorch APIs, including those described above, return either `Result` or `Error` types. Let's understand what those are:

* [`Error`](https://github.com/pytorch/executorch/blob/main/runtime/core/error.h) is a C++ enum containing a collection of valid error codes, where the default is `Error::Ok`, denoting success.

* [`Result`](https://github.com/pytorch/executorch/blob/main/runtime/core/result.h) can hold either an `Error` if the operation has failed or a payload, i.e., the actual result of the operation like an `EValue` wrapping a `Tensor` or any other standard C++ data type if the operation succeeded. To check if `Result` has a valid value, call the `ok()` function. To get the `Error` use the `error()` function, and to get the actual data, use the overloaded `get()` function or dereferencing pointer operators like `*` and `->`.

### Profile the Module

Use [ExecuTorch Dump](sdk-etdump.md) to trace model execution. Create an instance of the `ETDumpGen` class and pass it to the `Module` constructor. After executing a method, save the `ETDump` to a file for further analysis. You can capture multiple executions in a single trace if desired.

```cpp
#include <fstream>
#include <memory>
#include <executorch/extension/module/module.h>
#include <executorch/sdk/etdump/etdump_flatcc.h>

using namespace ::torch::executor;

Module module("/path/to/model.pte", Module::MlockConfig::UseMlock, std::make_unique<ETDumpGen>());

// Execute a method, e.g. module.forward(...); or module.execute("my_method", ...);

if (auto* etdump = dynamic_cast<ETDumpGen*>(module.event_tracer())) {
const auto trace = etdump->get_etdump_data();

if (trace.buf && trace.size > 0) {
std::unique_ptr<void, decltype(&free)> guard(trace.buf, free);
std::ofstream file("/path/to/trace.etdump", std::ios::binary);

if (file) {
file.write(static_cast<const char*>(trace.buf), trace.size);
}
}
}
```
2 changes: 1 addition & 1 deletion docs/source/getting-started-setup.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,7 +183,7 @@ Output 0: tensor(sizes=[1], [2.])
```
:::

To learn how to build a similar program, visit the [ExecuTorch in C++ Tutorial](running-a-model-cpp-tutorial.md).
To learn how to build a similar program, visit the [Runtime APIs Tutorial](extension-module.md).

### [Optional] Setting Up Buck2
**Buck2** is an open-source build system that some of our examples currently utilize for building and running.
Expand Down
8 changes: 8 additions & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,6 +93,7 @@ Topics in this section will help you get started with ExecuTorch.

tutorials/export-to-executorch-tutorial
running-a-model-cpp-tutorial
extension-module
tutorials/sdk-integration-tutorial
demo-apps-ios
demo-apps-android
Expand Down Expand Up @@ -225,6 +226,13 @@ ExecuTorch tutorials.
:link: running-a-model-cpp-tutorial.html
:tags:

.. customcarditem::
:header: Simplified Runtime APIs Tutorial
:card_description: A simplified tutorial for executing the model on device.
:image: _static/img/generic-pytorch-logo.png
:link: extension-module.html
:tags:

.. customcarditem::
:header: Using the ExecuTorch SDK to Profile a Model
:card_description: A tutorial for using the ExecuTorch SDK to profile and analyze a model with linkage back to source code.
Expand Down
3 changes: 1 addition & 2 deletions docs/source/llm/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -344,8 +344,7 @@ curl -O https://raw.githubusercontent.com/pytorch/executorch/main/examples/llm_m
curl -O https://raw.githubusercontent.com/pytorch/executorch/main/examples/llm_manual/managed_tensor.h
```

To learn more, see [Running an ExecuTorch Model in C++](../running-a-model-cpp-tutorial.md)
and the [ExecuTorch Runtime API Reference](../executorch-runtime-api-reference.md).
To learn more, see the [Runtime APIs Tutorial](../extension-module.md).

### Building and Running

Expand Down
1 change: 1 addition & 0 deletions docs/source/running-a-model-cpp-tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,3 +143,4 @@ assert(output.isTensor());
## Conclusion

In this tutorial, we went over the APIs and steps required to load and perform an inference with an ExecuTorch model in C++.
Also, check out the [Simplified Runtime APIs Tutorial](extension-module.md).
3 changes: 2 additions & 1 deletion docs/source/runtime-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,8 @@ However, please note:

For more details about the ExecuTorch runtime, please see:

* [Runtime API Tutorial](running-a-model-cpp-tutorial.md)
* [Detailed Runtime APIs Tutorial](running-a-model-cpp-tutorial.md)
* [Simplified Runtime APIs Tutorial](extension-module.md)
* [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
* [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
* [Runtime Profiling](sdk-profiling.md)
Expand Down
Loading