[QNN EP] Add session option to disable fallback to default CPU EP #16016

adrianlizarraga · 2023-05-18T20:28:00Z

Description

Adds the session config option disable_cpu_ep_fallback to allow the user to prevent the CPU EP from handling
nodes not supported by other execution providers.

// Graph nodes that are not supported by the execution providers (EPs) explicitly added to the session are
// assigned (i.e., "fallback") to the CPU EP by default.
//
// This option allows the user to disable the fallback of unsupported graph nodes to the CPU EP.
// If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot
// fully support all of the nodes in the graph.
//
// It is invalid to set this option and explicitly add the CPU EP to the session. In this case, session creation
// will also fail with an error.
//
// Option values:
// - "0": CPU EP fallback is not disabled. [DEFAULT]
// - "1": CPU EP fallback is disabled.
static const char* const kOrtSessionOptionsDisableCPUEPFallback = "session.disable_cpu_ep_fallback";

Example use

#include "core/session/onnxruntime_cxx_api.h"
#include "core/session/onnxruntime_session_options_config_keys.h"

int main(int argc, char** argv) {
    Ort::SessionOptions so;
    so.AddConfigEntry(kOrtSessionOptionsDisableCPUEPFallback, "1");  // Disable fallback to the CPU EP.

    onnxruntime::ProviderOptions options;
#if defined(_WIN32)
    options["backend_path"] = "QnnCpu.dll";
#else
    options["backend_path"] = "libQnnCpu.so";
#endif

    so.AppendExecutionProvider("QNN", options);

    const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "qnn_ep_partial_support.onnx";
    Ort::Session session(*ort_env, ort_model_path, so);  // Throws exception if nodes fallback to CPU
    // ...

Motivation and Context

Makes it easier for application developers to ensure that the entire model runs on specific EPs. This is critical for Qualcomm/scenarios. If the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference)

pranavsharma · 2023-05-18T21:41:44Z

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

adrianlizarraga · 2023-05-18T23:27:33Z

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

Thanks Pranav. I'll look into an alternative solution with a session config option, perhaps using a check similar to what CUDA is doing here:

onnxruntime/onnxruntime/core/session/inference_session.cc

Line 1536 in ea7b2de

    
           } else if (!AreAllNodesInMainGraphAssignedToOneEp(graph, onnxruntime::kCudaExecutionProvider)) {

jywu-msft · 2023-05-18T23:52:05Z

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

yes, seems like there should be a framework level option to disable default CPU fallback.
the current contract is we defer partition/priority decisions to the user. it is reasonable for them to request we disable that behavior.
This is critical for qualcomm/scnearios. if the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference)

onnxruntime/core/session/provider_registration.cc

…s not throw an error for a model that is fully supported by QNN EP.

pranavsharma

Looks good. Some minor changes.

onnxruntime/core/session/inference_session.cc

Co-authored-by: Pranav Sharma <prs@microsoft.com>

…but user adds CPU EP to session explicitly. Also, adds a test for this case.

onnxruntime/core/session/provider_registration.cc

HectorSVC

adrianlizarraga added 3 commits May 18, 2023 13:25

Add provider option to force entire model to run on QNN EP

c0e2b85

Fix check for models that do not run entirely on QNN

85569f4

Edit error messages

fcab174

Warning on signed int comparison

765f2b1

Re-implement to add a generic option that disables CPU EP fallback.

aba57b8

adrianlizarraga changed the title ~~[QNN EP] Add session provider option to force entire model to run on QNN EP~~ [QNN EP] Add session option to disable fallback to default CPU EP May 19, 2023

adrianlizarraga commented May 19, 2023

View reviewed changes

onnxruntime/core/session/provider_registration.cc Show resolved Hide resolved

Add check to ensure that enabling session.disable_cpu_ep_fallback doe…

1129123

…s not throw an error for a model that is fully supported by QNN EP.

pranavsharma reviewed May 19, 2023

View reviewed changes

onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved

onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved

adrianlizarraga and others added 2 commits May 19, 2023 00:34

Update error message when fallback to CPU is disabled

934f827

Co-authored-by: Pranav Sharma <prs@microsoft.com>

Set error code to INVALID_ARGUMENT when fallback to CPU is disabled, …

29b45d0

…but user adds CPU EP to session explicitly. Also, adds a test for this case.

adrianlizarraga marked this pull request as ready for review May 19, 2023 08:22

adrianlizarraga requested review from HectorSVC and jywu-msft May 19, 2023 08:22

Fix Win arm64 qnn test

dfaf46d

HectorSVC reviewed May 19, 2023

View reviewed changes

onnxruntime/core/session/provider_registration.cc Show resolved Hide resolved

Update QNN EP creation in python bindings

59aa6d1

pranavsharma previously approved these changes May 19, 2023

View reviewed changes

HectorSVC previously approved these changes May 19, 2023

View reviewed changes

jywu-msft previously approved these changes May 19, 2023

View reviewed changes

Fix merge conflicts from main branch

1e477cd

adrianlizarraga dismissed stale reviews from jywu-msft, HectorSVC, and pranavsharma via 1e477cd May 19, 2023 18:29

HectorSVC previously approved these changes May 19, 2023

View reviewed changes

adrianlizarraga added 2 commits May 19, 2023 16:45

Merge latest commits from main to get CI to pass

ec6a020

Merge branch 'main' into adrianl/qnn-option-enforce-entire-graph-runs

53c9461

adrianlizarraga added 2 commits May 22, 2023 23:36

Make utility function a local variable

18f5ba3

Merge branch 'main' into adrianl/qnn-option-enforce-entire-graph-runs

32196f5

adrianlizarraga dismissed HectorSVC’s stale review via 32196f5 May 23, 2023 06:38

jywu-msft approved these changes May 24, 2023

View reviewed changes

jywu-msft merged commit efc84a4 into main May 24, 2023

jywu-msft deleted the adrianl/qnn-option-enforce-entire-graph-runs branch May 24, 2023 00:56

chilo-ms mentioned this pull request Sep 29, 2023

TensorrtExecutionProvider slower than CUDAExecutionProvider: Faster-rcnn [Performance] #17434

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN EP] Add session option to disable fallback to default CPU EP #16016

[QNN EP] Add session option to disable fallback to default CPU EP #16016

adrianlizarraga commented May 18, 2023 •

edited

Loading

pranavsharma commented May 18, 2023

adrianlizarraga commented May 18, 2023 •

edited

Loading

jywu-msft commented May 18, 2023

pranavsharma left a comment

HectorSVC left a comment

[QNN EP] Add session option to disable fallback to default CPU EP #16016

[QNN EP] Add session option to disable fallback to default CPU EP #16016

Conversation

adrianlizarraga commented May 18, 2023 • edited Loading

Description

Example use

Motivation and Context

pranavsharma commented May 18, 2023

adrianlizarraga commented May 18, 2023 • edited Loading

jywu-msft commented May 18, 2023

pranavsharma left a comment

Choose a reason for hiding this comment

HectorSVC left a comment

Choose a reason for hiding this comment

adrianlizarraga commented May 18, 2023 •

edited

Loading

adrianlizarraga commented May 18, 2023 •

edited

Loading