Support matmul 4d inputs on pack-peel-4-level-tiling #1098
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds basic support for matmul with 4d inputs and output on pack-peel-4-level-tiling.
In theory, the tiling strategy and pipeline should support different layouts of matmul 4d operations. However, the order of the input dims (both inner and outer dims) are crucial for the correct compilation and results.
To ensure the correctness and for comparison purpose, this PR only adds a test that corresponds to the standard matmul, which means the order of the input dims for this operation corresponds to the L2 shapes of the matmul op after the first level packing, i.e.,
C += matmul4d(A,B) where A:MxKxM0xK0, B:NxKxK0xN0, C:NxMxM0xN0
The test class and instance added in run.py is preliminary and for experimental purpose. Generalization of the test class will be addressed as follow-ups.
Runtime comparison on Phoenix CI:
matmul_512_4096_512_bf16_f32 : 1141us vs matmul4d_16_128_8_bf16_f32: 998us