Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD/Loop framework upgrade #2937

Merged

Conversation

AlexandreEichenberger
Copy link
Collaborator

@AlexandreEichenberger AlexandreEichenberger commented Sep 10, 2024

Added support for handling SIMD for loops that have loop iterations that are not multiple of the Vector Length.

Because we can now generate SIMD code using either Krnl, Affine, or SCF, it was painful to have multiple ways to generate loops. I have now a unified interface that create loops across all 3 dialects:

  void forLoopIE(IndexExpr lb, IndexExpr ub, int64_t step, bool useParallel,
      mlir::function_ref<void(SCFBuilder &, mlir::ValueRange)> bodyFn) const;

which takes a lower/upper bound as IndexExpr, a boolean to define if the loop is sequential or parallel, and the function to be called.

An example is shown here

    // Invocation of the (possibly parallel) SIMD loop.
     if constexpr (std::is_same<BUILDER, KrnlBuilder>::value ||
                   std::is_same<BUILDER, AffineBuilder>::value ||
                   std::is_same<BUILDER, SCFBuilder>::value)
       builder.forLoopIE(lb, simdUb, VL, useParallel, simdLoopBody);
    else
      llvm_unreachable("BUILDER type not supported\n");

This complement the 3 SIMD calls: simdIterateIE, simdReduceIE, and simdReduce2DIE. The last 2 calls both perform reductions, but the first one uses horizontal/do-across reductions (e.g. available on z16 with integer add) and the second one use shuffle to mix VL consecutive reductions.
All simd calls now work with arbitrary numbers of loop iterations (whether a multiple of the hardware vector length or not).

To better provide the same functionality to both reduce simd calls, I expect now one lambda function per output (before one lambda function to generate all outputs).

We also had different calls for memory load/store. Now a common interface is used for Krnl, Affine, and MemRef, and nearly identical for Vector (where the load operation needs the type to determine the VL).

They all use the calls below

  mlir::Value load(mlir::Value memref, mlir::ValueRange indices = {},
      mlir::ValueRange offsets = {}) const;
  mlir::Value loadIE(mlir::Value memref, mlir::ArrayRef<IndexExpr> indices = {},
      mlir::ValueRange offsets = {}) const;
  void store(mlir::Value val, mlir::Value memref, mlir::ValueRange indices = {},
      mlir::ValueRange offsets = {}) const;
  void storeIE(mlir::Value val, mlir::Value memref,
      mlir::ArrayRef<IndexExpr> indices, mlir::ValueRange offsets = {}) const;

Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Signed-off-by: Alexandre Eichenberger <alexe@us.ibm.com>
Copy link
Collaborator

@tungld tungld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Really appreciate your effort of simplifying these interfaces!

@AlexandreEichenberger AlexandreEichenberger merged commit 9dd7c4a into onnx:main Sep 18, 2024
7 checks passed
@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15654 [push] SIMD/Loop framework upgr... started at 07:04

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15657 [push] SIMD/Loop framework upgr... started at 08:04

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14685 [push] SIMD/Loop framework upgr... started at 08:16

@jenkins-droid
Copy link
Collaborator

Jenkins Linux amd64 Build #15654 [push] SIMD/Loop framework upgr... passed after 1 hr 8 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux s390x Build #15657 [push] SIMD/Loop framework upgr... passed after 1 hr 25 min

@jenkins-droid
Copy link
Collaborator

Jenkins Linux ppc64le Build #14685 [push] SIMD/Loop framework upgr... passed after 2 hr 3 min

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants