-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN EP] Add session option to disable fallback to default CPU EP #16016
[QNN EP] Add session option to disable fallback to default CPU EP #16016
Conversation
This almost looks like a violation of the contract promised by the ORT framework of falling back to the less preferred EPs. It should be the framework that decides whether to fall back or not (based on user config) and fail the session creation accordingly. GetCapability should only report what nodes it can run. Looks like we could have a session level framework config, something like 'disable_ep_fallback' which is false by default. The partitioner can then choose to consult other EPs in the preferred list. |
Thanks Pranav. I'll look into an alternative solution with a session config option, perhaps using a check similar to what CUDA is doing here:
|
yes, seems like there should be a framework level option to disable default CPU fallback. |
…s not throw an error for a model that is fully supported by QNN EP.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Some minor changes.
Co-authored-by: Pranav Sharma <prs@microsoft.com>
…but user adds CPU EP to session explicitly. Also, adds a test for this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1e477cd
Description
Adds the session config option
disable_cpu_ep_fallback
to allow the user to prevent the CPU EP from handlingnodes not supported by other execution providers.
Example use
Motivation and Context
Makes it easier for application developers to ensure that the entire model runs on specific EPs. This is critical for Qualcomm/scenarios. If the compute cannot be offloaded to the NPU, running on CPU is not acceptable. (could be the difference between 90 second inference and 6 seconds inference)