Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN EP] Add session option to disable fallback to default CPU EP #16016

Merged
merged 15 commits into from
May 24, 2023

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented May 18, 2023

Description

Adds the session config option disable_cpu_ep_fallback to allow the user to prevent the CPU EP from handling
nodes not supported by other execution providers.

// Graph nodes that are not supported by the execution providers (EPs) explicitly added to the session are
// assigned (i.e., "fallback") to the CPU EP by default.
//
// This option allows the user to disable the fallback of unsupported graph nodes to the CPU EP.
// If this option is set to "1", session creation will fail if the execution providers other than the CPU EP cannot
// fully support all of the nodes in the graph.
//
// It is invalid to set this option and explicitly add the CPU EP to the session. In this case, session creation
// will also fail with an error.
//
// Option values:
// - "0": CPU EP fallback is not disabled. [DEFAULT]
// - "1": CPU EP fallback is disabled.
static const char* const kOrtSessionOptionsDisableCPUEPFallback = "session.disable_cpu_ep_fallback";

Example use

#include "core/session/onnxruntime_cxx_api.h"
#include "core/session/onnxruntime_session_options_config_keys.h"

int main(int argc, char** argv) {
    Ort::SessionOptions so;
    so.AddConfigEntry(kOrtSessionOptionsDisableCPUEPFallback, "1");  // Disable fallback to the CPU EP.

    onnxruntime::ProviderOptions options;
#if defined(_WIN32)
    options["backend_path"] = "QnnCpu.dll";
#else
    options["backend_path"] = "libQnnCpu.so";
#endif

    so.AppendExecutionProvider("QNN", options);

    const ORTCHAR_T* ort_model_path = ORT_MODEL_FOLDER "qnn_ep_partial_support.onnx";
    Ort::Session session(*ort_env, ort_model_path, so);  // Throws exception if nodes fallback to CPU
    // ...

Motivation and Context

Makes it easier for application developers to ensure that the entire model runs on specific EPs. This is critical for Qualcomm/scenarios. If the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference)

@pranavsharma
Copy link
Contributor

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

@adrianlizarraga
Copy link
Contributor Author

adrianlizarraga commented May 18, 2023

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

Thanks Pranav. I'll look into an alternative solution with a session config option, perhaps using a check similar to what CUDA is doing here:

} else if (!AreAllNodesInMainGraphAssignedToOneEp(graph, onnxruntime::kCudaExecutionProvider)) {

@jywu-msft
Copy link
Member

This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list.

yes, seems like there should be a framework level option to disable default CPU fallback.
the current contract is we defer partition/priority decisions to the user. it is reasonable for them to request we disable that behavior.
This is critical for qualcomm/scnearios. if the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference)

@adrianlizarraga adrianlizarraga changed the title [QNN EP] Add session provider option to force entire model to run on QNN EP [QNN EP] Add session option to disable fallback to default CPU EP May 19, 2023
…s not throw an error for a model that is fully supported by QNN EP.
Copy link
Contributor

@pranavsharma pranavsharma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Some minor changes.

onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved
onnxruntime/core/session/inference_session.cc Outdated Show resolved Hide resolved
adrianlizarraga and others added 2 commits May 19, 2023 00:34
Co-authored-by: Pranav Sharma <prs@microsoft.com>
…but user adds CPU EP to session explicitly. Also, adds a test for this case.
@adrianlizarraga adrianlizarraga marked this pull request as ready for review May 19, 2023 08:22
pranavsharma
pranavsharma previously approved these changes May 19, 2023
HectorSVC
HectorSVC previously approved these changes May 19, 2023
Copy link
Contributor

@HectorSVC HectorSVC left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

jywu-msft
jywu-msft previously approved these changes May 19, 2023
HectorSVC
HectorSVC previously approved these changes May 19, 2023
@jywu-msft jywu-msft merged commit efc84a4 into main May 24, 2023
@jywu-msft jywu-msft deleted the adrianl/qnn-option-enforce-entire-graph-runs branch May 24, 2023 00:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants