From a85e6e34113308dde75cf1929f37ebd8b33d65dd Mon Sep 17 00:00:00 2001
From: pytorchbot <soumith+bot@pytorch.org>
Date: Mon, 3 Jun 2024 13:26:33 -0700
Subject: [PATCH] Add docs on Module extension. (#3798) (#3807)

Summary:
Pull Request resolved: https://github.com/pytorch/executorch/pull/3798
overriding_review_checks_triggers_an_audit_and_retroactive_review
Oncall Short Name: executorch

Differential Revision: D58065736

fbshipit-source-id: 2d61bbaa7ad6a18f7a4a81d62246b14cbb8f8d02
(cherry picked from commit 13ba3a7cd0416e1b92d5015fe71cb7de97e72131)

Co-authored-by: Anthony Shoumikhin <shoumikhin@meta.com>
---
 docs/source/build-run-coreml.md               |   2 +-
 .../executorch-runtime-api-reference.rst      |   2 +-
 docs/source/extension-module.md               | 155 ++++++++++++++++++
 docs/source/getting-started-setup.md          |   2 +-
 docs/source/index.rst                         |   8 +
 docs/source/llm/getting-started.md            |   3 +-
 docs/source/running-a-model-cpp-tutorial.md   |   1 +
 docs/source/runtime-overview.md               |   3 +-
 8 files changed, 170 insertions(+), 6 deletions(-)
 create mode 100644 docs/source/extension-module.md

diff --git a/docs/source/build-run-coreml.md b/docs/source/build-run-coreml.md
index da830e542c..39794ac06c 100644
--- a/docs/source/build-run-coreml.md
+++ b/docs/source/build-run-coreml.md
@@ -143,7 +143,7 @@ libsqlite3.tbd
 ```
 5. Add the exported program to the [Copy Bundle Phase](https://developer.apple.com/documentation/xcode/customizing-the-build-phases-of-a-target#Copy-files-to-the-finished-product) of your Xcode target.
 
-6. Please follow the [running a model](./running-a-model-cpp-tutorial.md) tutorial to integrate the code for loading an ExecuTorch program.
+6. Please follow the [Runtime APIs Tutorial](extension-module.md) to integrate the code for loading an ExecuTorch program.
 
 7. Update the code to load the program from the Application's bundle.
 ``` objective-c
diff --git a/docs/source/executorch-runtime-api-reference.rst b/docs/source/executorch-runtime-api-reference.rst
index e78bcb3a7d..20dbc631f2 100644
--- a/docs/source/executorch-runtime-api-reference.rst
+++ b/docs/source/executorch-runtime-api-reference.rst
@@ -4,7 +4,7 @@ ExecuTorch Runtime API Reference
 The ExecuTorch C++ API provides an on-device execution framework for exported PyTorch models.
 
 For a tutorial style introduction to the runtime API, check out the
-`runtime api tutorial <running-a-model-cpp-tutorial.html>`__.
+`runtime tutorial <running-a-model-cpp-tutorial.html>`__ and its `simplified <extension-module.html>`__ version.
 
 Model Loading and Execution
 ---------------------------
diff --git a/docs/source/extension-module.md b/docs/source/extension-module.md
new file mode 100644
index 0000000000..cb328e10d5
--- /dev/null
+++ b/docs/source/extension-module.md
@@ -0,0 +1,155 @@
+# Running an ExecuTorch Model Using the Module Extension in C++
+
+**Author:** [Anthony Shoumikhin](https://github.com/shoumikhin)
+
+In the [Running an ExecuTorch Model in C++ Tutorial](running-a-model-cpp-tutorial.md), we explored the lower-level ExecuTorch APIs for running an exported model. While these APIs offer zero overhead, great flexibility, and control, they can be verbose and complex for regular use. To simplify this and resemble PyTorch's eager mode in Python, we introduce the Module facade APIs over the regular ExecuTorch runtime APIs. The Module APIs provide the same flexibility but default to commonly used components like `DataLoader` and `MemoryAllocator`, hiding most intricate details.
+
+## Example
+
+Let's see how we can run the `SimpleConv` model generated from the [Exporting to ExecuTorch tutorial](./tutorials/export-to-executorch-tutorial) using the `Module` APIs:
+
+```cpp
+#include <executorch/extension/module/module.h>
+
+using namespace ::torch::executor;
+
+// Create a Module.
+Module module("/path/to/model.pte");
+
+// Wrap the input data with a Tensor.
+float input[1 * 3 * 256 * 256];
+Tensor::SizesType sizes[] = {1, 3, 256, 256};
+TensorImpl tensor(ScalarType::Float, std::size(sizes), sizes, input);
+
+// Perform an inference.
+const auto result = module.forward({EValue(Tensor(&tensor))});
+
+// Check for success or failure.
+if (result.ok()) {
+  // Retrieve the output data.
+  const auto output = result->at(0).toTensor().const_data_ptr<float>();
+}
+```
+
+The code now boils down to creating a `Module` and calling `forward()` on it, with no additional setup. Let's take a closer look at these and other `Module` APIs to better understand the internal workings.
+
+## APIs
+
+### Creating a Module
+
+Creating a `Module` object is an extremely fast operation that does not involve significant processing time or memory allocation. The actual loading of a `Program` and a `Method` happens lazily on the first inference unless explicitly requested with a dedicated API.
+
+```cpp
+Module module("/path/to/model.pte");
+```
+
+### Force-Loading a Method
+
+To force-load the `Module` (and thus the underlying ExecuTorch `Program`) at any time, use the `load()` function:
+
+```cpp
+const auto error = module.load();
+
+assert(module.is_loaded());
+```
+
+To force-load a particular `Method`, call the `load_method()` function:
+
+```cpp
+const auto error = module.load_method("forward");
+
+assert(module.is_method_loaded("forward"));
+```
+Note: the `Program` is loaded automatically before any `Method` is loaded. Subsequent attemps to load them have no effect if one of the previous attemps was successful.
+
+### Querying for Metadata
+
+Get a set of method names that a Module contains udsing the `method_names()` function:
+
+```cpp
+const auto method_names = module.method_names();
+
+if (method_names.ok()) {
+  assert(method_names.count("forward"));
+}
+```
+
+Note: `method_names()` will try to force-load the `Program` when called the first time.
+
+Introspect miscellaneous metadata about a particular method via `MethodMeta` struct returned by `method_meta()` function:
+
+```cpp
+const auto method_meta = module.method_meta("forward");
+
+if (method_meta.ok()) {
+  assert(method_meta->name() == "forward");
+  assert(method_meta->num_inputs() > 1);
+
+  const auto input_meta = method_meta->input_tensor_meta(0);
+
+  if (input_meta.ok()) {
+    assert(input_meta->scalar_type() == ScalarType::Float);
+  }
+  const auto output_meta = meta->output_tensor_meta(0);
+
+  if (output_meta.ok()) {
+    assert(output_meta->sizes().size() == 1);
+  }
+}
+```
+
+Note: `method_meta()` will try to force-load the `Method` when called for the first time.
+
+### Perform an Inference
+
+Assuming that the `Program`'s method names and their input format is known ahead of time, we rarely need to query for those and can run the methods directly by name using the `execute()` function:
+
+```cpp
+const auto result = module.execute("forward", {EValue(Tensor(&tensor))});
+```
+
+Which can also be simplified for the standard `forward()` method name as:
+
+```cpp
+const auto result = module.forward({EValue(Tensor(&tensor))});
+```
+
+Note: `execute()` or `forward()` will try to force load the `Program` and the `Method` when called for the first time. Therefore, the first inference will take more time than subsequent ones as it loads the model lazily and prepares it for execution unless the `Program` or `Method` was loaded explicitly earlier using the corresponding functions.
+
+### Result and Error Types
+
+Most of the ExecuTorch APIs, including those described above, return either `Result` or `Error` types. Let's understand what those are:
+
+* [`Error`](https://github.com/pytorch/executorch/blob/main/runtime/core/error.h) is a C++ enum containing a collection of valid error codes, where the default is `Error::Ok`, denoting success.
+
+* [`Result`](https://github.com/pytorch/executorch/blob/main/runtime/core/result.h) can hold either an `Error` if the operation has failed or a payload, i.e., the actual result of the operation like an `EValue` wrapping a `Tensor` or any other standard C++ data type if the operation succeeded. To check if `Result` has a valid value, call the `ok()` function. To get the `Error` use the `error()` function, and to get the actual data, use the overloaded `get()` function or dereferencing pointer operators like `*` and `->`.
+
+### Profile the Module
+
+Use [ExecuTorch Dump](sdk-etdump.md) to trace model execution. Create an instance of the `ETDumpGen` class and pass it to the `Module` constructor. After executing a method, save the `ETDump` to a file for further analysis. You can capture multiple executions in a single trace if desired.
+
+```cpp
+#include <fstream>
+#include <memory>
+#include <executorch/extension/module/module.h>
+#include <executorch/sdk/etdump/etdump_flatcc.h>
+
+using namespace ::torch::executor;
+
+Module module("/path/to/model.pte", Module::MlockConfig::UseMlock, std::make_unique<ETDumpGen>());
+
+// Execute a method, e.g. module.forward(...); or module.execute("my_method", ...);
+
+if (auto* etdump = dynamic_cast<ETDumpGen*>(module.event_tracer())) {
+  const auto trace = etdump->get_etdump_data();
+
+  if (trace.buf && trace.size > 0) {
+    std::unique_ptr<void, decltype(&free)> guard(trace.buf, free);
+    std::ofstream file("/path/to/trace.etdump", std::ios::binary);
+
+    if (file) {
+      file.write(static_cast<const char*>(trace.buf), trace.size);
+    }
+  }
+}
+```
diff --git a/docs/source/getting-started-setup.md b/docs/source/getting-started-setup.md
index 0c6d6d86cc..0efde4fe6b 100644
--- a/docs/source/getting-started-setup.md
+++ b/docs/source/getting-started-setup.md
@@ -183,7 +183,7 @@ Output 0: tensor(sizes=[1], [2.])
   ```
   :::
 
-To learn how to build a similar program, visit the [ExecuTorch in C++ Tutorial](running-a-model-cpp-tutorial.md).
+To learn how to build a similar program, visit the [Runtime APIs Tutorial](extension-module.md).
 
 ### [Optional] Setting Up Buck2
 **Buck2** is an open-source build system that some of our examples currently utilize for building and running.
diff --git a/docs/source/index.rst b/docs/source/index.rst
index 52934b4ea6..8c4ad7eaed 100644
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -93,6 +93,7 @@ Topics in this section will help you get started with ExecuTorch.
 
    tutorials/export-to-executorch-tutorial
    running-a-model-cpp-tutorial
+   extension-module
    tutorials/sdk-integration-tutorial
    demo-apps-ios
    demo-apps-android
@@ -225,6 +226,13 @@ ExecuTorch tutorials.
    :link: running-a-model-cpp-tutorial.html
    :tags:
 
+.. customcarditem::
+   :header: Simplified Runtime APIs Tutorial
+   :card_description: A simplified tutorial for executing the model on device.
+   :image: _static/img/generic-pytorch-logo.png
+   :link: extension-module.html
+   :tags:
+
 .. customcarditem::
    :header: Using the ExecuTorch SDK to Profile a Model
    :card_description: A tutorial for using the ExecuTorch SDK to profile and analyze a model with linkage back to source code.
diff --git a/docs/source/llm/getting-started.md b/docs/source/llm/getting-started.md
index ae743e8e6d..a863dfc174 100644
--- a/docs/source/llm/getting-started.md
+++ b/docs/source/llm/getting-started.md
@@ -344,8 +344,7 @@ curl -O https://raw.githubusercontent.com/pytorch/executorch/main/examples/llm_m
 curl -O https://raw.githubusercontent.com/pytorch/executorch/main/examples/llm_manual/managed_tensor.h
 ```
 
-To learn more, see [Running an ExecuTorch Model in C++](../running-a-model-cpp-tutorial.md)
-and the [ExecuTorch Runtime API Reference](../executorch-runtime-api-reference.md).
+To learn more, see the [Runtime APIs Tutorial](../extension-module.md).
 
 ### Building and Running
 
diff --git a/docs/source/running-a-model-cpp-tutorial.md b/docs/source/running-a-model-cpp-tutorial.md
index e06bbed775..675f1c23f3 100644
--- a/docs/source/running-a-model-cpp-tutorial.md
+++ b/docs/source/running-a-model-cpp-tutorial.md
@@ -143,3 +143,4 @@ assert(output.isTensor());
 ## Conclusion
 
 In this tutorial, we went over the APIs and steps required to load and perform an inference with an ExecuTorch model in C++.
+Also, check out the [Simplified Runtime APIs Tutorial](extension-module.md).
diff --git a/docs/source/runtime-overview.md b/docs/source/runtime-overview.md
index 55c07902b7..2cc9f70f33 100644
--- a/docs/source/runtime-overview.md
+++ b/docs/source/runtime-overview.md
@@ -156,7 +156,8 @@ However, please note:
 
 For more details about the ExecuTorch runtime, please see:
 
-* [Runtime API Tutorial](running-a-model-cpp-tutorial.md)
+* [Detailed Runtime APIs Tutorial](running-a-model-cpp-tutorial.md)
+* [Simplified Runtime APIs Tutorial](extension-module.md)
 * [Runtime Build and Cross Compilation](runtime-build-and-cross-compilation.md)
 * [Runtime Platform Abstraction Layer](runtime-platform-abstraction-layer.md)
 * [Runtime Profiling](sdk-profiling.md)