Skip to content

Commit

Permalink
Added docs for ONNX 1.17 covering logging, tracing, and QNN EP Profil…
Browse files Browse the repository at this point in the history
…ing (#19428)

### Description
Added docs for ONNX 1.17 covering logging, tracing, and QNN EP Profiling

### Motivation and Context
- ONNX Logging has not been documented
- ONNX Tracing with Windows has barely been documented
- ONNX 1.17 has new tracing and QNN EP Profiling

PRs: #16259,  #18201, #18882, #19397
  • Loading branch information
ivberg authored Feb 7, 2024
1 parent 519fa38 commit 7e64928
Show file tree
Hide file tree
Showing 10 changed files with 137 additions and 11 deletions.
8 changes: 4 additions & 4 deletions docs/build/custom.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,7 +165,7 @@ _[This section is coming soon]_

### iOS

To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/apple/build_and_assemble_ios_pods.py) script from the ONNX Runtime repo.
To produce pods for an iOS build, use the [build_and_assemble_apple_pods.py](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/apple/build_and_assemble_apple_pods.py) script from the ONNX Runtime repo.

1. Check out the version of ONNX Runtime you want to use.

Expand All @@ -174,7 +174,7 @@ To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https
For example:

```bash
python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py \
python3 tools/ci_build/github/apple/build_and_assemble_apple_pods.py \
--staging-dir /path/to/staging/dir \
--include-ops-by-config /path/to/ops.config \
--build-settings-file /path/to/build_settings.json
Expand All @@ -186,14 +186,14 @@ To produce pods for an iOS build, use the [build_and_assemble_ios_pods.py](https

The reduced set of ops in the custom build is specified with the file provided to the `--include_ops_by_config` option. See the current op config used by the pre-built mobile package at [tools/ci_build/github/android/mobile_package.required_operators.config](https://github.com/microsoft/onnxruntime/blob/main/tools/ci_build/github/android/mobile_package.required_operators.config) (Android and iOS pre-built mobile packages share the same config file). You can use this file directly.

The default package does not include the training APIs. To create a training package, add `--enable_training_apis` in the build options file provided to `--build-settings-file` and add the `--variant Training` option when calling `build_and_assemble_ios_pods.py`.
The default package does not include the training APIs. To create a training package, add `--enable_training_apis` in the build options file provided to `--build-settings-file` and add the `--variant Training` option when calling `build_and_assemble_apple_pods.py`.

For example:

```bash
# /path/to/build_settings.json is a file that includes the `--enable_training_apis` option
python3 tools/ci_build/github/apple/build_and_assemble_ios_pods.py \
python3 tools/ci_build/github/apple/build_and_assemble_apple_pods.py \
--staging-dir /path/to/staging/dir \
--include-ops-by-config /path/to/ops.config \
--build-settings-file /path/to/build_settings.json \
Expand Down
2 changes: 1 addition & 1 deletion docs/build/eps.md
Original file line number Diff line number Diff line change
Expand Up @@ -104,7 +104,7 @@ See more information on the TensorRT Execution Provider [here](../execution-prov
* The path to the CUDA installation must be provided via the CUDA_PATH environment variable, or the `--cuda_home` parameter. The CUDA path should contain `bin`, `include` and `lib` directories.
* The path to the CUDA `bin` directory must be added to the PATH environment variable so that `nvcc` is found.
* The path to the cuDNN installation (path to cudnn bin/include/lib) must be provided via the cuDNN_PATH environment variable, or `--cudnn_home` parameter.
* On Windows, cuDNN requires [zlibwapi.dll](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-zlib-windows). Feel free to place this dll under `path_to_cudnn/bin`
* On Windows, cuDNN requires [zlibwapi.dll](https://docs.nvidia.com/deeplearning/cudnn/installation/windows.html). Feel free to place this dll under `path_to_cudnn/bin`
* Follow [instructions for installing TensorRT](https://docs.nvidia.com/deeplearning/tensorrt/install-guide/index.html)
* The TensorRT execution provider for ONNX Runtime is built and tested with TensorRT 8.6.
* The path to TensorRT installation must be provided via the `--tensorrt_home` parameter.
Expand Down
3 changes: 3 additions & 0 deletions docs/execution-providers/QNN-ExecutionProvider.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@ The QNN Execution Provider supports a number of configuration options. These pro
|'basic'||
|'detailed'||

See [profiling-tools](../performance/tune-performance/profiling-tools.md) for more info on profiling
Alternatively to setting profiling_level at compile time, profiling can be enabled dynamically with ETW (Windows). See [tracing](../performance/tune-performance/logging_tracing.md) for more details

|`"rpc_control_latency"`|Description|
|---|---|
|microseconds (string)|allows client to set up RPC control latency in microseconds|
Expand Down
2 changes: 1 addition & 1 deletion docs/performance/tune-performance/iobinding.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: I/O Binding
grand_parent: Performance
parent: Tune performance
nav_order: 4
nav_order: 5
---

# I/O Binding
Expand Down
95 changes: 95 additions & 0 deletions docs/performance/tune-performance/logging_tracing.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
---
title: Logging & Tracing
grand_parent: Performance
parent: Tune performance
nav_order: 2
---

# Logging & Tracing

## Contents
{: .no_toc }

* TOC placeholder
{:toc}


## Developer Logging

ONNX Runtime has built-in cross-platform internal [printf style logging LOGS()](https://github.com/microsoft/onnxruntime/blob/main/include/onnxruntime/core/common/logging/macros.h). This logging is available to configure in *production builds* for a dev **using the API**.

There will likely be a performance penalty for using the default sink output (stdout) with higher log severity levels.

### log_severity_level
[Python](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_severity_level) (below) - [C/C++ CreateEnv](https://onnxruntime.ai/docs/api/c/struct_ort_api.html#a22085f699a2d1adb52f809383f475ed1) / [OrtLoggingLevel](https://onnxruntime.ai/docs/api/c/group___global.html#ga1c0fbcf614dbd0e2c272ae1cc04c629c) - [.NET/C#](https://onnxruntime.ai/docs/api/csharp/api/Microsoft.ML.OnnxRuntime.SessionOptions.html#Microsoft_ML_OnnxRuntime_SessionOptions_LogSeverityLevel)
```python
sess_opt = SessionOptions()
sess_opt.log_severity_level = 0 // Verbose
sess = ort.InferenceSession('model.onnx', sess_opt)
```

### Note
Note that [log_verbosity_level](https://onnxruntime.ai/docs/api/python/api_summary.html#onnxruntime.SessionOptions.log_verbosity_level) is a separate setting and only available in DEBUG custom builds.

## Tracing About

Tracing is a super-set of logging in that tracing
- Includes the previously mentioned logging
- Adds tracing events that are more structured than printf style logging
- Can be integrated with a larger tracing eco-system of the OS, such that
- Tracing from multiple systems with ONNX, OS system level, and user-mode software that uses ONNX can be combined
- Timestamps are high resolution and consistent with other traced components
- Can log at high performance with a high number of events / second.
- Events are not logged via stdout, but instead usually via a high performance in memory sink
- Can be enabled dynamically at run-time to investigate issues including in production systems

Currently, only Tracelogging combined with Windows ETW is supported, although [TraceLogging](https://github.com/microsoft/tracelogging) is cross-platform and support for other OSes instrumentation systems could be added.

## Tracing - Windows

There are 2 main ONNX Runtime TraceLogging providers that can be enabled at run-time that can be captured with Windows [ETW](https://learn.microsoft.com/en-us/windows-hardware/test/weg/instrumenting-your-code-with-etw)

### Quickstart Tracing with WPR

On Windows, you can use Windows Performance Recorder ([WPR](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/wpr-command-line-options)) to capture a trace. The 2 providers covered below are already configured in these WPR profiles.

- Download [ort.wprp](https://github.com/microsoft/onnxruntime/blob/main/ort.wprp) and [etw_provider.wprp](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/platform/windows/logging/etw_provider.wprp) (these could also be combined later)

```dos
wpr -start ort.wprp -start etw_provider.wprp
echo Repro the issue allowing ONNX to run
wpr -stop onnx.etl -compress
```

### ONNXRuntimeTraceLoggingProvider
Beginning in ONNX Runtime 1.17 the [ONNXRuntimeTraceLoggingProvider](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/platform/windows/logging/HowToValidateEtwSinkOutput.md) can also be enabled.

This will dynamically trace with high-performance the previously mentioned LOGS() macro printf logs that were previously only controlled by log_severity_level. A user or developer tracing with this provider will have the log severity level set dynamically with what ETW level they provide at run-time.

Provider Name: ONNXRuntimeTraceLoggingProvider
Provider GUID: 929DD115-1ECB-4CB5-B060-EBD4983C421D
Keyword: Logs (0x2) keyword per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L83)
Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks)

### Microsoft.ML.ONNXRuntime

The [Microsoft.ML.ONNXRuntime](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/platform/windows/telemetry.cc#L47) provider provides structured logging.

Provider Name: Microsoft.ML.ONNXRuntime
Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d
Keywords: Multiple per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81)
Level: 1 (CRITICAL ) through 5 (VERBOSE) per [TraceLoggingLevel](https://learn.microsoft.com/en-us/windows/win32/api/traceloggingprovider/nf-traceloggingprovider-tracelogginglevel#remarks)
Note: This provider supports ETW [CaptureState](https://learn.microsoft.com/en-us/windows-hardware/test/wpt/capturestateonsave) (Rundown) for logging state for example when a trace is saved

ORT 1.17 includes new events logging session options and EP provider options

#### Profiling

Microsoft.ML.ONNXRuntime can also output profiling events. That is covered in [profiling](profiling-tools.md)

### WinML

WindowsML has it's own tracing providers that be enabled in addition the providers above

- Microsoft.Windows.WinML - d766d9ff-112c-4dac-9247-241cf99d123f
- Microsoft.Windows.AI.MachineLearning - BCAD6AEE-C08D-4F66-828C-4C43461A033D
2 changes: 1 addition & 1 deletion docs/performance/tune-performance/memory.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Memory consumption
grand_parent: Performance
parent: Tune performance
nav_order: 2
nav_order: 3
---

# Reduce memory consumption
Expand Down
30 changes: 29 additions & 1 deletion docs/performance/tune-performance/profiling-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,34 @@ In both cases, you will get a JSON file which contains the detailed performance
* Type chrome://tracing in the address bar
* Load the generated JSON file

## Execution Provider (EP) Profiling

Starting with ONNX 1.17 support has been added to profile EPs or Neural Processing Unit (NPU)s, if that EP supports profiling in it's SDK

## Qualcomm QNN EP

As mentioned in the [QNN EP Doc](../../execution-providers/QNN-ExecutionProvider.md) profiling is supported

### Cross-Platform CSV Tracing

The Qualcomm AI Engine Direct SDK (QNN SDK) supports profiling. QNN will output to CSV in a text format if a dev were to use the QNN SDK directly outside ONNX. To enable equivalent functionality, ONNX mimics this support and outputs the same CSV formatting.

If profiling_level is provided then ONNX will append log to current working directory a qnn-profiling-data.csv [file](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L911)

### TraceLogging ETW (Windows) Profiling

As covered in [logging](logging_tracing.md) ONNX supports dynamic enablement of tracing ETW providers. Specifically the following settings. If the Tracelogging provider is enabled and profiling_level was provided, then CSV support is automatically disabled

- Provider Name: Microsoft.ML.ONNXRuntime
- Provider GUID: 3a26b1ff-7484-7484-7484-15261f42614d
- Keywords: Profiling = 0x100 per [logging.h](https://github.com/ivberg/onnxruntime/blob/user/ivberg/ETWRundown/include/onnxruntime/core/common/logging/logging.h#L81)
- Level:
- 5 (VERBOSE) = profiling_level=basic (good details without perf loss)
- greater than 5 = profiling_level=detailed (individual ops are logged with inference perf hit)
- Event: [QNNProfilingEvent](https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/core/providers/qnn/builder/qnn_backend_manager.cc#L1083)

## GPU Profiling

To profile CUDA kernels, please add the cupti library to your PATH and use the onnxruntime binary built from source with `--enable_cuda_profiling`.
To profile ROCm kernels, please add the roctracer library to your PATH and use the onnxruntime binary built from source with `--enable_rocm_profiling`.

Expand All @@ -55,4 +83,4 @@ If an operator called multiple kernels during execution, the performance numbers
{"cat":"Node", "name":<name of the node>, ...}
{"cat":"Kernel", "name":<name of the kernel called first>, ...}
{"cat":"Kernel", "name":<name of the kernel called next>, ...}
```
```
2 changes: 1 addition & 1 deletion docs/performance/tune-performance/threading.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Thread management
grand_parent: Performance
parent: Tune performance
nav_order: 3
nav_order: 4
---

# Thread management
Expand Down
2 changes: 1 addition & 1 deletion docs/performance/tune-performance/troubleshooting.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
title: Troubleshooting
grand_parent: Performance
parent: Tune performance
nav_order: 5
nav_order: 6
---

# Troubleshooting performance issues
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorials/csharp/csharp-gpu.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ See this table for supported versions:
NOTE: Full table can be found [here](https://onnxruntime.ai/docs/execution-providers/CUDA-ExecutionProvider.html#requirements)


- Follow section [2. Installing cuDNN on Windows](https://docs.nvidia.com/deeplearning/cudnn/install-guide/index.html#install-windows). NOTE: Skip step 5 in section 2.3 on updating Visual Studio settings, this is only for C++ projects.
- Follow section [2. Installing cuDNN on Windows](https://docs.nvidia.com/deeplearning/cudnn/installation/windows.html). NOTE: Skip step 5 in section 2.3 on updating Visual Studio settings, this is only for C++ projects.

- Restart your computer and verify the installation by running the following command or in python with PyTorch:

Expand Down

0 comments on commit 7e64928

Please sign in to comment.