Pdll #976

KavithaTipturMadhu · 2024-09-27T00:43:51Z

No description provided.

adam-smnk · 2024-09-27T07:20:37Z

tools/tpp-opt/tpp-opt.cpp


 int main(int argc, char **argv) {
  mlir::registerAllPasses();
  mlir::tpp::registerTppCompilerPasses();
+  mlir::tpp::registerConvertVectorToXsmmPass();


I'd expect the pass registration to be already included in registerTppCompilerPasses.
Is a separate one really needed?

adam-smnk · 2024-09-27T07:23:58Z

lib/TPP/Transforms/Utils/VNNIUtils.cpp

+// to the callee to specify the expected rank in the VNNI layout as the rank
+// depends on the operations we are dealing with.
+bool isInVnniLayout(VnniOperandRank expectedRank, VectorType vector) {
+  return isInVnniLayout((int64_t)expectedRank, vector);


nit: use static_cast

adam-smnk · 2024-09-27T07:25:16Z

lib/TPP/Transforms/IntelAMXTileConfig.cpp

@@ -122,17 +213,17 @@ struct IntelAMXTileConfigInsertionPass
    : public impl::IntelAMXTileConfigInsertionPassBase<
          IntelAMXTileConfigInsertionPass> {
  void populateCombinePatterns(RewritePatternSet &patterns) {
-    patterns.add<IntelAMXTileConfig<xsmm::BrgemmOp, xsmm::BrgemmDispatchOp>>(


Are we ready to already retire the old lowering?

adam-smnk · 2024-09-27T10:44:13Z

lib/TPP/Dialect/Xsmm/XsmmUtils.cpp

-    return xsmm::DataTypeAttr::get(rewriter.getContext(), xsmm::DataType::BF16);
-  return xsmm::DataTypeAttr::get(rewriter.getContext(), xsmm::DataType::F32);
+// Callable object to verify if `operand` has static shape.
+struct HasStaticShape {


Why not just include StructuredOpMatcher.h for these?

adam-smnk · 2024-09-27T10:59:21Z

lib/TPP/Conversion/ConvertVectorToXsmm/ConvertVectorToXsmm.cpp

+}
+
+static std::pair<Operation *, Operation *>
+buildOpImpl(PatternRewriter &rewriter, Operation *contractOp, Operation *input0,


Since you have access to rewriter, can you erase other ops here as well?
I wonder if we could have a single buildOp that on its own tries to fuse consumers to avoid combinatorial explosion of patterns.

adam-smnk · 2024-09-27T11:02:42Z

lib/TPP/Conversion/ConvertVectorToXsmm/ConvertVectorToXsmmPDL.pdll

+     rewrite root with{
+	let replacement = BuildOp(root, input0, input1, input2);
+        replace root with (replacement.dispatch, replacement.invoke);
+	let user = GetUser(replacement.dispatch);


Not a pdll expert but AFAIK there's no validation or matching on the user here.
I don't think we can just randomly erase it when for example:

%0 = vector.contract %1 = arith.subf %0, ...

I think you already match for a transfer_write as a consumer in some other patterns so, it's probably just missing here.

adam-smnk · 2024-10-03T14:12:25Z

include/TPP/Dialect/Xsmm/XsmmUtils.h

+FailureOr<vector::ContractionOp>
+makeMinorDimensionsInnerMost(RewriterBase &rewriter,
+                             vector::ContractionOp contractOp, unsigned m,
+                             unsigned n, unsigned k, xsmm::DataTypeAttr type);


xsmm::DataTypeAttr needs to be removed as it still couples to Xsmm dialect

Actually, I see it still relies on XsmmEnum in general.
I guess for now we can keep that as it's easy to generate them and we can refactor later.

adam-smnk · 2024-10-07T15:26:51Z

I'm getting pretty different results between linalg vs vector to xsmm:

Linalg test case:

tpp-opt ../test.mlir -convert-linalg-to-xsmm -convert-xsmm-to-func | tpp-run -e entry --entry-point-result=void -seed 123 -print

func.func @entry(%arg0: memref<32x32xf32>, %arg1: memref<32x32xf32>, %arg2: memref<32x32xf32>) 
  -> memref<32x32xf32> {
  linalg.matmul ins(%arg0, %arg1: memref<32x32xf32>, memref<32x32xf32>)
    outs(%arg2: memref<32x32xf32>)
  return %arg2 : memref<32x32xf32>
}

Vector test case:

tpp-opt ../test.mlir -convert-vector-to-xsmm-pass | tpp-run -e entry --entry-point-result=void -seed 123 -print

#map = affine_map<(d0, d1, d2) -> (d0, d2)>
#map1 = affine_map<(d0, d1, d2) -> (d2, d1)>
#map2 = affine_map<(d0, d1, d2) -> (d0, d1)>
module {
  func.func @entry(%arg0: memref<32x32xf32>, %arg1: memref<32x32xf32>, %arg2: memref<32x32xf32>)
  -> memref<32x32xf32> {
    %c0 = arith.constant 0 : index
    %cst = arith.constant 0.000000e+00 : f32
    %0 = vector.transfer_read %arg0[%c0, %c0],
      %cst {in_bounds = [true, true]} : memref<32x32xf32>, vector<32x32xf32>
    %1 = vector.transfer_read %arg1[%c0, %c0],
      %cst {in_bounds = [true, true]} : memref<32x32xf32>, vector<32x32xf32>
    %2 = vector.transfer_read %arg2[%c0, %c0],
      %cst {in_bounds = [true, true]} : memref<32x32xf32>, vector<32x32xf32>
    %3 = vector.contract {indexing_maps = [#map, #map1, #map2],
      iterator_types = ["parallel", "parallel", "reduction"],
      kind = #vector.kind<add>} %0, %1, %2
      : vector<32x32xf32>, vector<32x32xf32> into vector<32x32xf32>
    vector.transfer_write %3, %arg2[%c0, %c0]
      {in_bounds = [true, true]} : vector<32x32xf32>, memref<32x32xf32>
    return %arg2 : memref<32x32xf32>
  }
}

adam-smnk · 2024-10-08T09:30:36Z

After a bit of IR diffing, the current difference comes from dispatch call:

created from linalg:
%0 = call @xsmm_gemm_dispatch(%c1_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c0_i64) : (i64, i64, i64, i64, i64, i64, i64, i64) -> i64
created from vector:
%0 = call @xsmm_gemm_dispatch(%c1_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c1_i64, %c1_i64, %c0_i64) : (i64, i64, i64, i64, i64, i64, i64, i64, i64, i64) -> i64

The new call builder add two extra arguments 8th and 9th %c1_i64, %c1_i64 which correspond to unit strides.
Now I'm not really sure how this "just works"^tm as we don't have a wrapper for 10 args gemm_dispatch. But it somehow runs and just produces invalid results.

KavithaTipturMadhu · 2024-10-08T10:24:43Z

After a bit of IR diffing, the current difference comes from dispatch call:

created from linalg:
%0 = call @xsmm_gemm_dispatch(%c1_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c0_i64) : (i64, i64, i64, i64, i64, i64, i64, i64) -> i64

created from vector:
%0 = call @xsmm_gemm_dispatch(%c1_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c32_i64, %c1_i64, %c1_i64, %c0_i64) : (i64, i64, i64, i64, i64, i64, i64, i64, i64, i64) -> i64

The new call builder add two extra arguments 8th and 9th %c1_i64, %c1_i64 which correspond to unit strides. Now I'm not really sure how this "just works"^tm as we don't have a wrapper for 10 args gemm_dispatch. But it somehow runs and just produces invalid results.

Fixed the issue in the last commit @adam-smnk

rengolin · 2024-10-11T15:14:21Z

include/TPP/Passes.h

 namespace xegpu {
 class XeGPUDialect;
 } // namespace xegpu

 } // namespace mlir
-
+#include "TPP/Conversion/ConvertVectorToXsmm/ConvertVectorToXsmm.h"


This seems out of place. Why is this needed and the others are not?

adam-smnk reviewed Sep 27, 2024

View reviewed changes

adam-smnk reviewed Oct 3, 2024

View reviewed changes

KavithaTipturMadhu force-pushed the pdll branch from db785cf to 51f1df5 Compare October 7, 2024 04:34

KavithaTipturMadhu force-pushed the pdll branch from fafa71d to 413bb90 Compare October 11, 2024 02:58

rengolin reviewed Oct 12, 2024

View reviewed changes

KavithaTipturMadhu added 5 commits October 14, 2024 03:36

Vector to Xsmm

9832b57

Fix for gemm flags

35948c8

Xsmm dialect cleanup

aa6db91

Xsmm tests deleted

53b2c85

Fixes for brgemm

ff762d9

KavithaTipturMadhu force-pushed the pdll branch 3 times, most recently from 853e9e4 to 0a86e14 Compare October 15, 2024 13:24

Fixes

e2bde23

KavithaTipturMadhu force-pushed the pdll branch from 0a86e14 to e2bde23 Compare October 16, 2024 15:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pdll #976

Pdll #976

KavithaTipturMadhu commented Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Sep 27, 2024

adam-smnk Oct 3, 2024

adam-smnk Oct 3, 2024

adam-smnk commented Oct 7, 2024

adam-smnk commented Oct 8, 2024

KavithaTipturMadhu commented Oct 8, 2024

rengolin Oct 11, 2024

Pdll #976

Are you sure you want to change the base?

Pdll #976

Conversation

KavithaTipturMadhu commented Sep 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

adam-smnk commented Oct 7, 2024

Linalg test case:

Vector test case:

adam-smnk commented Oct 8, 2024

KavithaTipturMadhu commented Oct 8, 2024

Choose a reason for hiding this comment