Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qualcomm AI Engine Direct - add program validation #4297

Closed
wants to merge 3 commits into from

Conversation

haowhsu-quic
Copy link
Collaborator

Summary:

  • update graph signature for get_fake_program to work properly
  • make sure program is valid after capture_program
  • retire cature_pre_autograd_graph
  • fix release build error & make cross-compile flatcc deterministic
  • some minor fix

Summary:
- update graph signature for get_fake_program to work properly
- make sure program is valid after capture_program
- retire cature_pre_autograd_graph
- fix release build error & make cross-compile flatcc deterministic
- some minor fix
Copy link

pytorch-bot bot commented Jul 18, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/4297

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (2 Unrelated Failures)

As of commit 77e4060 with merge base 4a88318 (image):

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 18, 2024
@haowhsu-quic
Copy link
Collaborator Author

haowhsu-quic commented Jul 18, 2024

Hi @cccclai, this PR relates to #3860.
We also fix some build error on latest mainline.

For flatcc cross compile issue, since the original flow make ExternalProject_Add and build process happen concurrently which would fail randomly if the finish order was not deterministic.
I think we should move the host build to cmake configuration stage and leave the cross build only when compiling.

Thank you.

@dbort dbort requested a review from cccclai July 18, 2024 17:30
@dbort
Copy link
Contributor

dbort commented Jul 18, 2024

Thank you for these changes, @haowhsu-quic. To help focus our reviews, please create separate PRs for:

  • The backend_api.py change
  • The sdk/CMakeLists.txt change
  • The Qualcomm changes

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for putting up the pr!

f"Error in get_fake_program for graph {edge_program.graph_module}, fallback to deepcopy: {e}"
)
fake_edge_program = copy.deepcopy(edge_program)
fake_edge_program = get_fake_program(edge_program)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change can be separate - we may need to test it with broader tests in case some paths still rely on the fallback. Glad to see qnn path can work with the fake edge program! I believe the RAM usage will go down quite a bit now.

@@ -84,25 +84,16 @@ option(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT
)

if(EXECUTORCH_SEPARATE_FLATCC_HOST_PROJECT)
# Add the host project. We build this separately so that we can generate
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also thanks for fixing the sdk build! cc: @Olivia-liu @tarun292

# param nodes will be FakeTensor when doing partition
# fill in random numeric for validation
if isinstance(coeff, torch._subclasses.fake_tensor.FakeTensor):
coeff = torch.ones(coeff.shape)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is coeff inserted to graph or just an intermediate tensor? If it's inserted, we may need to lift it to the i/o?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's inserted, but we make it static inside the graph for it could be identified when building operator. This can reduce extra memory copies, I think.

@haowhsu-quic
Copy link
Collaborator Author

Thank you for these changes, @haowhsu-quic. To help focus our reviews, please create separate PRs for:

  • The backend_api.py change
  • The sdk/CMakeLists.txt change
  • The Qualcomm changes

Thank you, sorry for the inconvenience:
backend_api.py change: #4311
sdk/CMakeList.txt change: #4312

@dbort
Copy link
Contributor

dbort commented Jul 19, 2024

Thank you so much for splitting up the PR! This is very helpful for us.

@@ -223,7 +224,12 @@ def capture_program(
core_ep.transform(ConvertBinaryOpsWithScalar())
edge_ep = core_ep.to_edge(qnn_edge_config())
_transform(edge_ep.exported_program)

# Since QDQ nodes are stripped, update graph signature again to validate program
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this diff - is it possible to run a mix of fp ops and quantized ops? Does it support well? The reason I'm asking is because we're removing all q/dq ops and the insert back to the i/o. It may limit us to do mixed dtypes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently mixed precision only supports for quantized ops since compiler spec for HTP precision (quantized or fp16) is on graph level granularity.
We'll have multi-graphs change in the near future, hopefully some mechanisms like weight sharing / fp mixed precision could be well-supported at that time.

Copy link
Collaborator Author

@haowhsu-quic haowhsu-quic Jul 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To do so, it would be great if the framework interface can provide runtime option (like having an argument in method::execute()) for backend to react: e.g. change performance config, select which graph in the context to be executed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we've been discussing how to pass the runtime option at the interface. Ideally passed it by the backend context. Question: do you need it to be method::init time or method::execute time?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

method::execute looks more flexible on QNN. Look forward to the change!

Copy link
Contributor

@cccclai cccclai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thank you for adding the validation and reduce the deep copy.

@facebook-github-bot
Copy link
Contributor

@cccclai has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@cccclai merged this pull request in 0e2b205.

facebook-github-bot pushed a commit that referenced this pull request Aug 2, 2024
Summary:
As mentioned in #4297, the original flow makes host / cross build happen concurrently. This change moves host build process into cmake configuring stage and refine related dependencies.

Pull Request resolved: #4312

Test Plan:
- cross-compile > Through running `backends/qualcomm/script/build.sh --release`, we could check if the compiling process successfully finished.
- native-compile > Run following to check:
```shell
cmake \
    -DCMAKE_BUILD_TYPE=RelWithDebInfo \
    -DQNN_SDK_ROOT=${QNN_SDK_ROOT} \
    -DEXECUTORCH_BUILD_QNN=ON \
    -DEXECUTORCH_BUILD_SDK=ON \
    -DPYTHON_EXECUTABLE=$PYTHON_EXECUTABLE \
    -S $EXECUTORCH_ROOT \
    -B $EXECUTORCH_ROOT/build_x86_64 \
```

Reviewed By: tarun292

Differential Revision: D60243701

Pulled By: dbort

fbshipit-source-id: ff8d8cb06f0cc296c7ef465596e7e3df367dd059
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. Merged
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants