Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SM68] Initial proposal for Wave Matrix #61

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

llvm-beanz
Copy link
Collaborator

This proposal adds new WaveMatrix data types that support lane cooperative matrix multiplication.

A preview implementation of this proposal is available as part of the DXC 1.8.2306 preview release.

This proposal adds new WaveMatrix data types that support lane
cooperative matrix multiplication.

A preview implementation of this proposal is available as part of the
DXC 1.8.2306 preview release.
Copy link
Member

@pow2clk pow2clk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the slew of comments 😕


These higher throughput matrix operations are required for optimal performance
of many machine learning and image processing workloads. Adding native support
to HLSL will enable high-performance matrix operations across all supported
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's minor, but I fear the notion of "supported" is vague here. It may be interpreted as all hardware that supports SM 6.8, which is probably not the case. My feeble attempt to recraft the sentence:

Adding support to HLSL will enable the high-performance of native matrix operations across all hardware with such support through Shader Model 6.8 drivers.

WaveMatrix introduces new matrix templates to facilitate wave cooperative
operations:

```c++
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't seem to make a lick of difference in this case, but ```hlsl is an option in github:

WaveMatrixLeft  <TYPE_IN, M, N> ;             // M x K
WaveMatrixLeft  <TYPE_IN, M, N> ;             // M x K

Maybe someday we'll get proper syntax highlighting?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI if you check the 3rd party grammars supported by Github's Linguist integration here, HLSL is listed as being supported by Tim's textmate grammar (so hlsl is a valid fenced code block and we just need to update that grammar to update it as the language changes). I would actually suggest having an "official" Microsoft-maintained grammar file that you can update as the compiler updates if it isn't too much trouble. It'd be super helpful for HLSL tool maintainers to have a more "official" syntax highlighter.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this code block the HLSL highlighting is probably okay, but GitHub’s HLSL syntax highlighting doesn’t handle HLSL 2021 particularly well. The C++ mode syntax highlighting does much better IMO.

As an example here’s HLSL highlighted:

namespace detail {
template <typename ElTy, int NRows, int NCols>
class WaveMatrixBase {
};
} // namespace detail

And the same code C++ highlighted:

namespace detail {
template <typename ElTy, int NRows, int NCols>
class WaveMatrixBase {
};
} // namespace detail

I’m fine changing the simpler code samples to be HLSL highlighted, but I’d greatly prefer to keep the ones that have templates C++-mode. I had used C++ everywhere for consistency, but that isn’t strictly necessary.

Copy link

@jeremyong jeremyong Aug 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think the thing to do is to fork the C++ tmLanguage file as a base, adding HLSL specific constructs/keywords, and then swap it as the official highlighting grammar for github (instructions for doing that are here. It'd be a shame for this to be fixed later and then have a bunch of HLSL snippets throughout the github ecosystem be tagged as C++ snippets into perpetuity I think!

All WaveMatrix objects have a `Fill` method of the form `void Fill(ElTy Value)`
where `ElTy` is the element type.

The `Fill` method fills the matrix or matrix fragment with the provided value.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a more technical description would be "Assigns the given Value to every element in the matrix or matrix fragment".

All wave threads must provide the same value or the result is undefined. All
WaveMatrix objects have the same `Fill` method with the same behavior.

### WaveMatrix Matrix Objects
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the explanation of Fill above would be a bit more logical after this section

### WaveMatrix Matrix Objects

The code below approximately declares the base interface that WaveMatrix matrix
objects implement.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add "the following sections will explain in detail what these methods do and what the parameters represent."

The `WaveMatrixAccumulator::MultiplyAccumulate` method performs multiplication
of the left and right arguments and adds the result back into the
`WaveMatrixAccumulator`. This is a wave-level operation and cannot be used
inside divergent control flow.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above, what if I do?

WaveMatrix intrinsics are defined to support quantization calculations.
Including calculating a sum for the rows of the left matrix and a sum of the
columns of the right matrix. The `WaveMatrixRightRowAcc` and
`WaveMatrixLeftColAcc` fragment accumulators perform this operation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should they have "Fragment" in the name? I was initially confused why they took full matrices as parameters, but I see through the inheritance that they are fragment accumulators

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love the names of those types, but I think if we add "Fragment" to the name we're going to be getting dangerously close to a type name that can't fit on one line of code with reasonable line wrapping rules.

#### Zero Point

The following is the equation for matrix multiplication with zero point
adjustment included:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section feels out of place in between the definition of WaveMatrix Fragment Acuumulators and the description of their SumAccumulate method. I recognize that it is referenced in the section just after it, but if this is defining the multiply operations, perhaps it can go after the multiply method description and the next section can link back to it?


$Z_*$ are constant zero points values

#### Wave Matrix SumAccumulate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe "WaveMatrix Fragment SumAccumulate" to be consistent with the title of that section?

#### Wave Matrix SumAccumulate

The `SumAccumulate` methods accumulate the values of the argument matrix into
the WaveMatrix fragment accumulator. The fragment WaveMatrix must have the same
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reading this and the description above the class, it's not clear to me what actually gets placed into the accumulator. Is it the sum of all elements in each row of a row-major matrix and the sum of all elements in each column of a column-major matrix?

@pow2clk pow2clk removed this from the Shader Model 6.8 milestone Jan 3, 2024
@Degerz
Copy link

Degerz commented Aug 7, 2024

microsoft/DirectXShaderCompiler#6807

I assume that this proposal has been denied ?

@llvm-beanz
Copy link
Collaborator Author

microsoft/DirectXShaderCompiler#6807

I assume that this proposal has been denied ?

Denied is probably the wrong phrasing... In need of significant reworking.

This is definitely a feature we want, but the design proposed here has some pretty big limitations that we haven't had time to address. That resulted in us pausing development on it and not shipping it with the Shader Model 6.8 release.

We hope to refine this feature (in public) and include an updated version of it in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Status: In review
Development

Successfully merging this pull request may close these issues.

5 participants