Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor matmul test suite. #22

Merged
merged 9 commits into from
Oct 15, 2024
Merged

Conversation

ScottTodd
Copy link
Member

@ScottTodd ScottTodd commented Sep 5, 2024

Progress on #2. See also the long Discord thread here.

Summaries of changes

Further decoupled test suites from the core CMake project

  • Forked iree_native_test.cmake to iree_test_suites_native_test.cmake
    • Dropped support (temporarily?) for testing on Android, RISC-V, and ARM with SME
  • Forked iree_e2e_generated_runner_test.cmake to iree_test_suites_runner_test.cmake
    • Dropped support (temporarily?) for filtering within the build system which tests are defined and compile .vmfb files
  • Now we can set -DIREE_BUILD_TESTS=OFF and avoid pulling in IREE's other tests
  • Added a new hand-authored linalg_ops/matmul/CMakeLists.txt that runs tests on each backend using default flags

Simplified the test generator

  • Dropped unused functions
  • Folded GPU-specific shapes into generic "small" and "large" shape test suites

Ran the generate_e2e_matmul_tests.py script offline and checked in the generated files

  • Currently 56 files totaling 1.90MB on disk (~27000 lines of code according to GitHub)
  • Now we can inspect the test cases without needing to run the generator locally, and I fixed a few formatting issues
  • I think this makes test suite management easier, and having the generated files in this test suites repository doesn't cost the main repository much (just extra git checkout time), but I could see a case for more tightly coupling the generator with the test runner

What is left to do?

  • I want to iterate some more on the linalg_ops/matmul/CMakeLists.txt file or move to a different test runner somehow. I mainly want to support XFAIL in some way for both compiling and running.
  • We should add back tests using CPU features like AVX512, GPU features like Vulkan float16 extensions, and other non-default flags somehow. Either infer what the compiler can from the host / target, or add test suites explicitly.

@ScottTodd
Copy link
Member Author

Requested reviews from a few people that might have suggestions. Sorry, the code is kind of gross before and after... such is the nature of test suites x_x

Copy link

@bjacob bjacob left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking after these tests!

Comment on lines +116 to +119
# Shapes involving vectors (i.e. most rectangular cases).
TestShape(m=1, k=1000, n=1000, accumulate=True), # large vector*matrix
TestShape(m=1000, k=1000, n=1, accumulate=True), # large matrix*vector
TestShape(m=1000, k=1000, n=1, accumulate=False), # large matrix*vector
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could see these matvec / vecmat test cases getting their own "shapes" group.

Other ideas:

  • small
  • large
  • vectors
  • square
  • aligned (all dimensions multiples of 4/8)
  • unaligned (odd number dimensions)
  • model_unet
  • model_resnet
  • model_llama

What other groupings would make sense to test and label?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's think about why exactly do we sometimes want to split test cases into separate "shapes" groups?

I say it's:

  • To get low-latency small-shapes tests that we can always run, regardless of high latencies and other problems with bigger shapes.
  • To get shapes that are deemed "suitable" for a particular target. Originally I had created only "small" and "large", both containing odd (what people call "unaligned") shapes to exercise the most corner cases. Then early GPU codegen folks came in and added GPU tests but said "hold on, we can't actually support odd shapes on GPU". That was the reason why "gpu_aligned" was introduced. I don't suppose that's current anymore?

I don't know a reason to split vectors into their own separate group.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was leaning more towards a single file per test case, like how https://github.com/nod-ai/rocm-gemm-benchmark/tree/main/kernels/mlir is set up. That has "problem sizes" coming from shapes found in production models: https://github.com/nod-ai/rocm-gemm-benchmark/blob/main/gemmbench/problems.py . The focus there is benchmarking though, not correctness testing.

@benvanik pointed out here on Discord that grouping was useful for test suites. A few of his points:

that's painful and wasteful when there's a logical grouping
the cost of launching the processes, creating thread pools/gpu devices, etc to run a single 4x4 matmul is astronomical
small (smoketests that should run frequently) and large (bigger regression tests that run less frequently) and a division by data type is how this stuff naturally falls out
most exclusions per target will be cut on data type - cpu doesn't run bf16, etc
if anything, I'd want a test runner to batch more - especially with how unreliable the AMD devices are

Comment on lines +92 to +106
set(_DTYPES)
# list(APPEND _DTYPES "i8_into_i32") # Currently failing.
list(APPEND _DTYPES "f32_into_f32")
# list(APPEND _DTYPES "f16_into_f16") # Failing to compile.
# list(APPEND _DTYPES "f16_into_f32") # Failing to compile.
# list(APPEND _DTYPES "bf16_into_bf16") # Failing to compile.
# list(APPEND _DTYPES "bf16_into_f32") # Failing to compile.
# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32") # Unsupported data type.
foreach(_DTYPE IN LISTS _DTYPES)
foreach(_SIZE IN LISTS _SIZES)
iree_test_suites_runner_test(
NAME
matmul_vulkan_${_DTYPE}_${_SIZE}
TESTS_SRC
"generated/${_DTYPE}/matmul_${_DTYPE}_${_SIZE}.mlir"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could pass list of xpass, xfail, skip here... maybe. Would then want to move compilation from build time to test time.

Vulkan should support f16 and i8, but I think that is target/extension dependent (see the --iree-vulkan-target=valhall flag in the iree-org/iree Vulkan tests and the vulkan_uses_vk_khr_shader_float16_int8 label )

list(APPEND _DTYPES "f16_into_f32")
list(APPEND _DTYPES "bf16_into_bf16")
list(APPEND _DTYPES "bf16_into_f32")
# list(APPEND _DTYPES "f8E4M3FNUZ_into_f32") # Failing to compile.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iree-org/iree tests ran tests on gfx94x/CDNA3 with the LLVMGPUVectorDistributeMFMA compilation info. Need to bake some target info into the test xpass/xfails.

@ScottTodd
Copy link
Member Author

Ping @erman-gurses / others that might have opinions?

Copy link

@erman-gurses erman-gurses left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm!

@ScottTodd
Copy link
Member Author

I've discussed this on and off with @erman-gurses . We'd like to merge and then iterate.

Probably worth diffing the changes in this repo with iree-org/iree#18725 to make sure we pick up all the improvements.

@ScottTodd ScottTodd merged commit 3a6820f into iree-org:main Oct 15, 2024
2 checks passed
@ScottTodd ScottTodd deleted the matmul-refactor branch October 15, 2024 20:07
@ScottTodd
Copy link
Member Author

Whoops, I should have known better than to merge without syncing and re-running the tests ;P

https://github.com/iree-org/iree-test-suites/actions/runs/11353574142/job/31578892613#step:8:445
<unknown>:0: error: HIP target not set; did you forget to pass '--iree-hip-target'?

This repo is low traffic and this PR touched a good chunk of large files so I'll fix-forward instead of revert.

ScottTodd added a commit that referenced this pull request Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants