Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Flux transformer benchmarking #870

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

sogartar
Copy link
Contributor

Add also some more general functionality to check benchmark results against baseline results. This requires the Google Benchmark compare.py tool that is not a part of the pip package. That is why I added the repo as a git submodule. This tool does a statistical comparison between benchmarks with proper p-value calculation. I don't think we should roll out our own.

Adds a new nightly CI job that should contain nightly tests and benchmarks that are not in their own category like Llama is now.

@sogartar sogartar force-pushed the flux-transformer-benchmark branch from 23e4919 to 913e9c0 Compare January 27, 2025 14:34
@sogartar
Copy link
Contributor Author

This PR depends on iree-org/iree-turbine#418.

@sogartar sogartar marked this pull request as ready for review January 27, 2025 14:39
@sogartar sogartar marked this pull request as draft January 27, 2025 15:03
Add also some more general functionality to check benchmark results
against baseline results. This requires the Google Benchmark compare.py
tool that is not a part of the pip package. That is why I added the
repo as a git submodule. This tool does a statistical comparison
between benchmarks with proper p-value calculation. I don't think we
should roll out our own.

Adds a new nightly CI job that should contain nightly tests and
benchmarks that are not in their own category like Llama is now.
@sogartar sogartar force-pushed the flux-transformer-benchmark branch from 913e9c0 to 47e648a Compare January 27, 2025 15:11
@sogartar sogartar marked this pull request as ready for review January 27, 2025 15:25
@sogartar
Copy link
Contributor Author

I am not sure if we should put the benchmarks in this repo or at https://github.com/nod-ai/SHARK-TestSuite.

Comment on lines 88 to 92
pytest -n 4 sharktank/ --durations=10
pytest \
-n 4 \
--durations=10 \
-m "not expensive" \
sharktank/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Be careful when adding new tests, particularly if they are expensive. Developers should be able to run pytest and expect a reasonable default set of tests to run. Developers will not remember opt-out marks like this.

The filtering also didn't work here? https://github.com/nod-ai/shark-ai/actions/runs/13079961026/job/36501056330?pr=870 is still running on standard github-hosted runners after 2h+.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failure is due to this required PR iree-org/iree-turbine#418.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the default marker selection in pyproject.toml to exclude expensive.

Comment on lines +1 to +3
[submodule "third_party/benchmark"]
path = third_party/benchmark
url = https://github.com/google/benchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually used? You have google-benchmark in Python test requirements already.

A source dependency on the C++ library could be added to shortfin as needed, probably via FetchContent here:

if(SHORTFIN_BUILD_TESTS)
if (NOT SHORTFIN_BUNDLE_DEPS AND NOT SHORTFIN_IREE_SOURCE_DIR)
# For now we use gtest shipped alongside with IREE.
FetchContent_Declare(
googletest
URL https://github.com/google/googletest/archive/03597a01ee50ed33e9dfd640b249b4be3799d395.zip
)
# For Windows: Prevent overriding the parent project's compiler/linker settings
set(gtest_force_shared_crt ON CACHE BOOL "" FORCE)
FetchContent_MakeAvailable(googletest)
endif()
include(GoogleTest)
enable_testing()
add_custom_target(shortfin_testdata_deps)
endif()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wrongly left the pip package dependency. It is removed now. Unfortunately, the script to compare benchmark results is not a part of the pip package. This script is used in the python test.
See https://github.com/google/benchmark/blob/main/docs/tools.md#comparepy
That is why the submodule is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd much prefer we consolidate existing nightly workflows (eval, sglang benchmark, llama large, etc.) before adding a new one. That being said, I like the "sharktank-nightly" name better than model-specific names...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the CI jobs are due for a refactoring. To reorganize them and to reduce code duplication. I want to do that next.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The advantage of keeping them separately is the ease of tracking various nightly across llm, flux models and sharktank/ shortfin regressions. If there are tools to do this, or a CI summary in github-pages, that would be great.

--verbose \
--capture=no \
--iree-hip-target=gfx942 \
--iree-device=hip://6 \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Match what other workflows do with hip devices: #725

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually allocate GPUs to CI jobs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I change it to used hip://0.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. Generally, CI jobs should be written to run on any compatible runner machine and be easy to reference and run locally. The runners themselves should be responsible for assigning devices.

Comment on lines 78 to 86
--html=out/benchmark/index.html \
sharktank/tests

- name: Deploy to GitHub Pages
uses: peaceiris/actions-gh-pages@4f9cc6602d3f66b9c108549d475ec49e8ef4d45e # v4.0.0
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./out/benchmark
destination_dir: ./benchmark
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"benchmark" isn't a very specific name, and it also doesn't match "sharktank nightly". Consider how you want this to slot in to

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a PR for the GH pages #899.

@sogartar sogartar requested a review from ScottTodd February 3, 2025 16:48
with:
github_token: ${{ secrets.SHARK_PLATFORM_GH_TOKEN }}
publish_dir: ./out/sharktank-nightly
destination_dir: ./sharktank-nightly
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe publish_dir should be ./out/sharktank_nightly/benchmark. Same for destination_dir
Also, we need to update sharktank-nightly to sharktank_nightly

Copy link
Contributor Author

@sogartar sogartar Feb 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to have one step to push all sharktank_nightly artifacts. This workflow in the future may have more jobs that produce artifacts. I would be surprised if the action does not support whole directory trees.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, were you able to test this nightly on this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants