[Tracking] Refactor MLIR-AIE dependency #430

makslevental · 2024-06-17T16:07:33Z

TL;DR: This issue tracks ongoing work to refactor the dependency on MLIR-AIE

Description:

The goal of this sprint of work is to remove the dependency on AIE and AIEX dialects completely (AIEVec will remain). This includes artifact generation. Thus, one final deliverable is a build configuration (of this repo/plugin) that does not need to clone/build Xilinx/mlir-aie at all (for platforms/deployments that don't need AIEVec). The "business" goal is a more stable device configuration and runtime layer/experience for higher level dialects.

Development Plan

The work proceeds in roughly two necessary phases (for the MVP) and planned extension work:

Vendoring/interning the relevant passes/parts of mlir-aie;
- aie-rt (the actual device configuration utility underneath mlir-aie) and bootgen (necessary for artifact construction) submodules;
- XCLBinGen;
- A minimal subset of AIE/AIEX passes necessary for supporting NPU.
  - aie-assign-lock-ids, aie-assign-buffer-descriptor-ids, aie-assign-buffer-addresses-basic, aie-pathfinder, aie-localize-locks, aie-objectstateful-transform, aiex-dma-to-npu;
  - Note, of these 7 passes, only aie-pathfinder and aie-objectstateful-transform perform any analysis
- A minimal subset of translation utilities;
  - aie-translate-cdo, aie-translate-bcf, aie-translate-ld-script, aie-translate-npu.
Refactor/merge/clean/DCE;
- Move all calls to aie-rt into a iree_aie_runtime library which emulates the various mlir_*_runtime libs upstream (but still is only called at compile time; see planned work below);
- Merge all non-analysis passes into a single pass;
- Simplify aie-pathfinder (remove legacy d_ary_heap.h) and aie-objectstateful-transform;
- Re-design AIETargetModel as AMDAIEDeviceModel and base the latter on aie-rt (i.e., use aie-rt APIs for querying relevant device attributes/characteristics);
- Re-design CDO emission to consume objectfifo directly instead of buffer, dma, switchbox etc;
  - This step eliminates all passes that transform those objects.

At this point we will have a completely unified/self-contained path from aie.objectfifo to .xclbin (modulo chess/peano) but we will still have a dependency on AIE dialect for the aie.objectfifo op. The immediate next step is to connect directly to amdaie.logicalobjectfifo in order to complete/reach the goal of removing the dependency on AIE/AIEX dialects (i.e., headers, libs, etc.). Because this last step is subject to progress on work involving amdaie.logicalobjectfifo, in fact all of the prior mentioned work will happen in a parallel lowering path through an ephemeral #hal.device.target<"amd-aie-direct", [#hal.executable.target<"amd-aie-direct">]>. Once the direct connection to amdaie.logicalobjectfifo is complete amd-aie-direct takes over as the only hal.

Regarding, AIEVec dialect: the AIEVec dialect (used for emitting vector intrinsics targeting the single cores) does not depend on AIE/AIEX dialects and thus we can continue to keep it as a dependency.

Planned extension/further work includes:

Reducing the friction between iree-amd-aie (this repo/plugin) and the single core compilers (i.e., chess and peano) by removing most of the "shell out";
- Currently translation to LLVM IR (.ll) for chess includes a chesshack step that rewrites present day LLVM IR to chess's version (15). Alternatively (I have verified this) we can emit llvm dialect (MLIR IR) and translate it to LLVM IR using mlir-translate-15 i.e., the version of mlir-translate built against llvmorg-15.0.7.
- This enables us to not only remove chesshack but furthermore link directly against MLIRTranslateLib built against the same tag¹. I have verified this as well;
- Same follows for peano but even moreso because we can directly link the single-core codegen libs (see foonote¹);
- Shell outs to xclbinutil and bootgen can also immediately be eliminated by simply making direct API calls into those libs (see xaiepy as a proof of this concept);
- This reduces the number of shell-outs to just one: chesscc.
Once we are able to fully control emitting device configurations/instructions/code (i.e., what aie-rt actually emits) we are (mostly) free to move to a more conventional model of dispatch;
- Calls to aie-rt can be "inlined" into the mid-level IR itself just as is done for the various GPU dialects, i.e., iree_aie_runtime can become a true runtime library;
  - The extent to which this is feasible (what can be configured/reconfigured outside of a CDO) is determined by both the firmware and the driver but I have verified that there are some objects that can be configured at runtime (shim DMAs). Thus, this work will involve expanding that set of objects.

Testing Plan

Each step will be unit tested using the canonical ground-truth source: mlir-aie/test. I.e., in the intermediate phases steps, we vendor relevant tests in addition to code. In addition, at the phase that it becomes feasible (after the completion of vendoring) each step will be tested E2E i.e., artifact generation and testing for numerical accuracy. Prior to connecting to amdaie.logicalobjectfifo we generate such executable starting from aie.objectfifo (using mlir-aie examples). After connecting to amdaie.logicalobjectfifo we are free to use all of our own E2E tests.

Current progress

Of the initial (MVP) work only the final step remains (redesigning CDO emission to consume aie.objectfifo). Timeline for this final step is ~1 week.

Questions/comments/concerns

How/what/where/when questions are more than welcome here; why questions should be kept for 1-1/team meetings.

cc @stellaraccident @MaheshRavishankar @powderluv @jtuyls @kumardeepakamd @yzhang93 @newling @Abhishek-Varma @nirvedhmeshram @daveliddell

By using a small trick to create "versioned namespaces": -DCMAKE_CXX_FLAGS="-Dmlir=mlir15 -Dllvm=llvm15". ↩ ↩²

The text was updated successfully, but these errors were encountered:

makslevental · 2024-07-14T02:43:35Z

makslevental self-assigned this Jun 17, 2024

makslevental mentioned this issue Jun 17, 2024

Add aie-rt and bootgen as submodules (1/n) #419

Merged

makslevental closed this as completed in #419 Jun 19, 2024

makslevental reopened this Jun 19, 2024

makslevental closed this as completed in #421 Jun 20, 2024

makslevental reopened this Jun 20, 2024

makslevental mentioned this issue Jun 23, 2024

Remove/refactor legacy code (7/n) #451

Merged

makslevental linked a pull request Jun 24, 2024 that will close this issue

Remove/refactor legacy code (7/n) #451

Merged

makslevental removed a link to a pull request Jun 24, 2024

Vendor XCLBinGen (4/n) #422

Closed

makslevental closed this as completed in #420 Jun 24, 2024

makslevental reopened this Jun 24, 2024

makslevental closed this as completed in #424 Jun 24, 2024

makslevental reopened this Jun 24, 2024

makslevental closed this as completed in #451 Jun 25, 2024

makslevental reopened this Jun 25, 2024

This was referenced Jul 23, 2024

[Conv2d] stride 2 test failed to legalize operation 'vector.extract_strided_slice' #581

Open

Move CDO emission to iree_aie_runtime #589

Merged

makslevental mentioned this issue Aug 7, 2024

Delete MLIR-AIE submodule #639

Merged

makslevental closed this as completed in #639 Aug 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Tracking] Refactor MLIR-AIE dependency #430

[Tracking] Refactor MLIR-AIE dependency #430

makslevental commented Jun 17, 2024 •

edited

Loading

makslevental commented Jul 14, 2024 •

edited

Loading

[Tracking] Refactor MLIR-AIE dependency #430

[Tracking] Refactor MLIR-AIE dependency #430

Comments

makslevental commented Jun 17, 2024 • edited Loading

Description:

Development Plan

Testing Plan

Current progress

Questions/comments/concerns

Footnotes

makslevental commented Jul 14, 2024 • edited Loading

makslevental commented Jun 17, 2024 •

edited

Loading

makslevental commented Jul 14, 2024 •

edited

Loading