RFC - Creating an openxla-nvgpu project #71

stellaraccident · 2023-03-28T00:57:46Z

Now that RFC - Proposal to Build IREE Compiler Plugin Mechanism has been implemented and is minimally working with both in-tree and an early adopter out of tree implementation, we are ready to propose the official creation of a new openxla-nvgpu project to house:

Compiler plugins aimed at vendor specific integrations for NVIDIA GPUs.
Runtime extensions for advanced use cases of NVIDIA parts.
Project scaffolding and development flow tooling.

Since this will be the first plugin project of note, we expect that to a certain extent, it will co-evolve with the mechanisms.

Proposed Directory Structure

openxla-nvgpu will have a similar directory layout to IREE, upon which it depends:

compiler/
  src/
    openxla_nvgpu/
      Dialects/
runtime/
  src/
    openxla_nvgpu/
build_tools/

The build setup will use bazel_to_cmake for consistency and interop with the upstream IREE project (now that iree-org/iree#12765 has taken the steps to allow it to be used out of tree). The corresponding IREE build macros will be extended as needed.

Dependencies

This project depends directly on:

IREE for the core platform.
openxla-pjrt-plugin for implementing an NVIDIA specific PJRT plugin.

It transitively depends on:

LLVM/MLIR via IREE (will be disaggregated as a standalone dep at a future time to be determined).
xla via openxla-pjrt-plugin for plugin and runtime support.
stablehlo via IREE.

Releases

In the very short term, this project will not produce independent releases and will aim to be usable by project developers who are willing to build it from HEAD.

In the longer term, we would like to introduce a new top-level openxla-compiler releaser project which is responsible for packaging the platform (IREE) with stable versions of all supported vendor compiler plugins and making available as a standalone binary release. Such a project would eventually depend on this one and would effectively be the plugin-aggregated release of the present day iree-compiler packages (which will continue to be released as the "vanilla" platform without out-of-kind platform specific dependencies).

Also in the longer term, as the PJRT plugin support evolves, we anticipate releasing openxla-nvgpu-pjrt binary packages that can be used to interface NVIDIA GPUs to supported ML frameworks via pip install.

Versioning and CI

Adding this top-level project pushes us firmly into a "many-repo" layout for OpenXLA projects. This will be further reinforced as IREE's dependencies and build tools are disaggregated over time and the top-level releaser projects are established.

As part of this, we will introduce a side by side workspace layout where dependencies are found relative to each other based on parent directory. Example:

iree/
xla/
openxla-pjrt-plugin/
openxla-nvgpu/
openxla-compiler-releaser/

Such a layout will be called an "OpenXLA workspace" and we will provide a sync script and CI tooling to help manage it. Each project will pin to green release tags in its parent (or other form of stable commit tracking) by maintaining a local metadata file of its openxla dependencies. The sync script will be both a simple way for developers to track known-good sync points for the workspace and for CIs to advance. There will be a CI bot which advances dependent projects to next stable sync points automatically. We expect that for projects that already have a strong release cadence like IREE, this will update pins to new nightly releases, and others will cascade from at-head commits.

This process of versioning will be developed over time with an eye towards being the one way to manage openxla project dependencies. It will likely be somewhat manual to start with. This will be derived in spirit from the PJRT plugin sync script and enhanced to provide better release tracking and version bump capabilities.

Benchmarking

Benchmarks of the NVIDIA toolchain will be largely inherited and leveraged from the IREE project but run independently so as to provide a continuous view of the performance deltas and characteristics of the platform-independent upstream and the vendor-specific downstream.

Next steps

As a very next step, the openxla-nvgpu project will be bootsrapped in the iree-samples repository. It will be relocated, with history, to a new git repository once this RFC has matriculated.

Project Ownership

The project will be set up as a collaboration between Google and NVIDIA, and per OpenXLA governance, will share maintainer responsibility between contributors from both companies with the goal of NVIDIA engineers taking on core maintainer responsibility as the project bootstraps and evolves.

The text was updated successfully, but these errors were encountered:

pjannaty · 2023-03-28T01:01:18Z

Fantastic! Looking forward to the collaboration.

pjannaty · 2023-03-28T01:10:24Z

Will this sit directly under https://github.com/openxla as in openxla/openxla-nvgpu?

stellaraccident · 2023-03-28T01:12:21Z

Will this sit directly under https://github.com/openxla as in openxla/openxla-nvgpu?

Yes, that is what I'm proposing.

pjannaty · 2023-03-28T01:13:03Z

cc @nluehr

mjsML · 2023-03-28T03:46:37Z

Does that translate to “write access”?

stellaraccident · 2023-03-28T16:06:06Z

Does that translate to “write access”?

Yes (which we already do on these repos for other contributors), but I tried to phrase it as more general. Being a "component maintainer" on OpenXLA parlance grants some other privileges in terms of overall openxla direction/management.

ezhulenev · 2023-03-30T20:20:22Z

What will be the C++ namespace for all the new code in this project? ::xla::...?

stellaraccident · 2023-03-30T20:23:41Z

What will be the C++ namespace for all the new code in this project? ::xla::...?

Not sure I would speak to "all of the new code", but historically, dialects and components of IREE take a namespace like ::mlir::iree_compiler::IREE::<COMPONENT>, which nests well by letting everything resolve components by IREE::. For the record, I don't love that it is all rooted under mlir but that has been that way for a long time and could be cleaned up.

If not extending that namespacing scheme, I'd encourage at least something similar: ::openxla_compiler::XLA. I've deviated from this a couple of times in the past and always regretted it and come back to normalize.

joker-eph · 2023-03-30T20:31:44Z

Proposed Directory Structure

openxla-nvgpu will have a similar directory layout to IREE, upon which it depends:
compiler/
  src/
    openxla_nvgpu/
      Dialects/
runtime/
  src/
    openxla_nvgpu/
build_tools/

I'm curious why the sub-directory names are repeating "openxla"? That is why compiler/src/openxla_nvgpu instead of compiler/src/nvgpu?
Even further, if the repository itself is all about nvgpu, why repeating it at all? What else will be inside compiler/src in this repo?

stellaraccident · 2023-03-30T20:34:47Z

It is just following the IREE convention of the include directory being rooted at the src/ and wanting to arrive at fully qualified include paths (i.e. `#include "openxla_nvgpu/...").

I've more or less resigned myself to the fact that the least bad thing is to repeat yourself exactly once in service of having globally unique include paths.

ezhulenev · 2023-03-30T20:49:16Z

I'm not a big fan of mlir top level namespace, and iree_compiler as nvgu really just a project on top of IREE compiler, and I don't like openxla_compiler namespace, and I don't have any suggestions :)

What if I want to write custom VM module under runtime/src/openxla_nvgpu? In IREE it's mostly under iree and iree::vm namespace, should it be ::openxla or regular ::xla here?

joker-eph · 2023-03-30T20:49:24Z

Oh I missed the point about the include directory convention, makes sense!
(I believe this convention is in place in mlir-hlo as well, I remember a rationale doc for the path structure, I think @burmako originated it, I don't know if it is public)

stellaraccident · 2023-03-30T20:52:35Z

I'm not a big fan of mlir top level namespace, and iree_compiler as nvgu really just a project on top of IREE compiler, and I don't like openxla_compiler namespace, and I don't have any suggestions :)

What if I want to write custom VM module under runtime/src/openxla_nvgpu? In IREE it's mostly under iree and iree::vm namespace, should it be ::openxla or regular ::xla here?

Just throwing things out there... ::openxla::nvgpu? We're going to regret whatever we choose. Might as well at least not start with a namespace that is already used.

ezhulenev · 2023-03-30T20:57:04Z

I'd go with ::openxla, ::openxla::compiler, ::openxla::runtime, etc...? And not mention nvgpu at all. Do we see a single project depending on multiple "openxla compilers", e.g. openxla-nvgpu and openxla-amdgpu? Or link into single binary?

stellaraccident · 2023-03-30T21:02:40Z

I'm game to try it. Like I said, I'm pretty sure that what we pick, we come back in a few months with some code written and apply a bit of sed, but I do think we want globally unique: there are many use cases where these will be linked together for both recommended and unrecommended reasons. Let's not set ourselves up for accidental name collisions.

Also, with C++17, the cost of namespaces (in terms of keystrokes) is a lot lower.

joker-eph · 2023-03-30T21:05:54Z

I would be concerned with plugins and target specific components redefining symbols in shared namespace: I'm not sure what the benefits of eliding the target from the namespace buys us in practice?
(nesting compilers under ::mlir (as in ::mlir::mhlo:: for example) is nice in that you have an implicit using namespace mlir; (maybe just a workaround for the google style banning using namespace?), but we're still heavily using a specific sub-namespace to compartimentalize).

stellaraccident · 2023-03-30T21:09:41Z

maybe just a workaround for the google style banning using namespace

(ftr - we're not exporting such legacy rules to our new OSS projects)

ezhulenev · 2023-03-30T23:44:02Z

Another directory naming question, why compiler/src/openxla_nvgpu and not compiler/src/openxla/nvgpu?

Will we also have some kind of openxla-base (util, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?

Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).

ezhulenev · 2023-03-30T23:54:27Z

stellaraccident · 2023-03-31T00:53:44Z

Will we also have some kind of openxla-base (util, ...) for libraries shared between different "backends", or the plan to keep shared code in iree repo?

That is being discussed, but we are biasing towards getting moving at the moment over busting out dependencies. We have some work to do to get the dev/versioning workflow going for the level of things we have, and I'd rather get some more mileage on that before we go too crazy with a lot of repositories.

Will we depend on absl/tsl? E.g. logging, Status, StatusOr inside nvgpu compiler/runtime? Or in compiler use LLVM logging (almost non existent), and in runtime use IREE logging (no idea what's the status).

The further "core-ward" we go, we have no plan to depend on absl/tsl, and I would be somewhat resistant to doing so because they have both proven to be problematic (so much so that we excised after thinking "how bad could it be?").

Concrete thoughts...

I don't think that we should be mixing universes in the compiler code and need to "build up" from LLVM vs grafting other base libraries.

The runtime code for nvgpu has some more give to it from a dependency standpoint, but for the level of things expected to be in there, I would like to avoid the complexity that comes from taking complicated deps if possible.

Some of this stuff is preference and some has proven to be more trouble than it is worth in the past... The hard line that we can't cross in this repo is that dependencies must be thin and must have a well supported CMake build.

stellaraccident · 2023-03-31T00:54:41Z

Another directory naming question, why compiler/src/openxla_nvgpu and not compiler/src/openxla/nvgpu?

I don't have a preference.

sherhut · 2023-03-31T16:21:31Z

Newer waste a good opportunity to bike-shed 😄

Logging, as brought up by @ezhulenev, caught my eye. For the compiler, I would suggest we try to use the diagnostic handler infrastructure as much as possible and log warnings/errors there. That will force us to provide messages with good context.
For "trace what this does" debugging use cases, I agree that LLVM-DEBUG is not great but will work for now. We will need to replace this later with something that has severity and other bells and whistles but that can be done. I am used to the VLOG interface and we could have a shim that maps to LLVM-DEBUG if we decide to care already. Maybe IREE wants to provide this akin to what tsl does, so that different integrations can swap in what fits their needs. Depending on tsl just for that seems a bit heavy.

Regarding namespaces: I agree with @stellaraccident: Lets bias on progress rather than perfect choice for now. Having said that, I personally would use openxla::compiler::nvgpu for the CUDA compiler. We will have more compilers sooner than later and having them in different namespaces helps my mental model of where code belongs. Might be xla bias and I have no strong opinion. Just some 🚲 🛖 🖌️

Excited to see this project spin up!

stellaraccident · 2023-03-31T16:51:41Z

This has been a heavy overhead week for me but I should finally get some coding time this afternoon, and since I can probably bootstrap the project somewhat efficiently, I'll take a stab at that. As noted, I'll stage it in an iree-samples directory first and will then hand off to someone Google-side to create the repo (which requires a bit of red tape).

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

stellaraccident · 2023-04-01T03:53:28Z

All right... the above two commits seem to get me most of the way there. Things build, etc. Was a bit of a slog.

…12888) * Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX to replace hard-coded path to namespace logic in iree_package_ns (and uses within IREE/removes the special casing). * Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE ` to bazel_to_cmake. Been carrying this patch for a while and ended up needing it. * Further generalizes bazel_to_cmake target resolution so that it is customizable as needed out of tree. * Moves the `iree::runtime::src::defs` target to `iree::defs` and puts it in the right place in the tree to avoid special casing. * Ditto but for `iree::compiler::src::defs` * Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode. * Makes iree_tablegen_library consult a scoped `IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include directories (makes it possible to use out of tree). * Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea why this was triggering for me but was getting undefined deps. Must have been coming in elsewhere in a more full featured build. * Fixes iree-opt initialization sequence with respect to command line options. Also fixed the test which should have been verifying this. * Fixed pytype issue in bazel_to_cmake that could theoretically happen. Fixes build issues related to the out of tree build for openxla/community#71

ezhulenev · 2023-04-03T21:06:45Z

And initial CuDNN custom module that does nothing related to CuDNN yet: https://github.com/iree-org/iree-samples/pull/123/files

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

stellaraccident · 2023-04-04T00:47:12Z

Both of the initial commits have landed.

@theadactyl Can we get someone with admin rights on the org to create the openxla-nvgpu repo? It should be populated with https://github.com/iree-org/iree-samples/tree/main/openxla-nvgpu

GMNGeoffrey · 2023-04-04T16:20:18Z

Coming in late and I think just agreeing with what's already been decided, but strong positions on: not taking an absl or tsl dep in the compiler and scoping the [sub]namespace to the project (so not just "iree" or "openxla" or "compiler")

I do think that the Google (Titus) advice on not having deeply nested namespaces is pretty good and not just a weird Google thing: https://abseil.io/tips/130. Every level of nesting gives us a fun new place for collisions. So I would vote for something like openxla_nvgpu as a top-level namespace. Probably nesting it under mlir would be fine, since we shouldn't define things that conflict with MLIR anyway. I think there was at some point a limitation of ODS that meant you had to define things in the mlir namespace, but I also suspect that this is a relic of the Google C++ style guide ban on using namespace mlir, which while not completely misguided IMO, seems excessively strict. using namespace mlir seems better than sneakily getting that by just defining everything nested under that namespace. I guess the difference is that you can only do the latter for one namespace. I think we can probably limit ourselves to using namespace mlir and using namespace llvm and then not have to nest our namespaces, unless there's some other reason to do so.

ezhulenev · 2023-04-04T16:29:55Z

I don't like openxla_nvgpu namespace, because presumably we'll have some kind of openxla-base shared by all openxla compilers, and it's nice to share a namespace to skip qualifying all imports.

+HUGE to using namespace but only for small number of things (mlir and llvm)

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

…12888) * Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX to replace hard-coded path to namespace logic in iree_package_ns (and uses within IREE/removes the special casing). * Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE ` to bazel_to_cmake. Been carrying this patch for a while and ended up needing it. * Further generalizes bazel_to_cmake target resolution so that it is customizable as needed out of tree. * Moves the `iree::runtime::src::defs` target to `iree::defs` and puts it in the right place in the tree to avoid special casing. * Ditto but for `iree::compiler::src::defs` * Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode. * Makes iree_tablegen_library consult a scoped `IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include directories (makes it possible to use out of tree). * Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea why this was triggering for me but was getting undefined deps. Must have been coming in elsewhere in a more full featured build. * Fixes iree-opt initialization sequence with respect to command line options. Also fixed the test which should have been verifying this. * Fixed pytype issue in bazel_to_cmake that could theoretically happen. Fixes build issues related to the out of tree build for openxla/community#71

Both repositories have long been created and are their own top level modules now. Fixes #72 Fixes #71

…ree-org#12888) * Adds CMake scoped IREE_PACKAGE_ROOT_DIR and IREE_PACKAGE_ROOT_PREFIX to replace hard-coded path to namespace logic in iree_package_ns (and uses within IREE/removes the special casing). * Adds support for `BAZEL_TO_CMAKE_PRESERVES_ALL_CONTENT_ABOVE_THIS_LINE ` to bazel_to_cmake. Been carrying this patch for a while and ended up needing it. * Further generalizes bazel_to_cmake target resolution so that it is customizable as needed out of tree. * Moves the `iree::runtime::src::defs` target to `iree::defs` and puts it in the right place in the tree to avoid special casing. * Ditto but for `iree::compiler::src::defs` * Adds a bit more logging to `_DEBUG_IREE_PACKAGE_NAME` mode. * Makes iree_tablegen_library consult a scoped `IREE_COMPILER_TABLEGEN_INCLUDE_DIRS` var for additional include directories (makes it possible to use out of tree). * Adds `NVPTXDesc` and `NVPTXInfo` targets to HAL_Target_CUDA. No idea why this was triggering for me but was getting undefined deps. Must have been coming in elsewhere in a more full featured build. * Fixes iree-opt initialization sequence with respect to command line options. Also fixed the test which should have been verifying this. * Fixed pytype issue in bazel_to_cmake that could theoretically happen. Fixes build issues related to the out of tree build for openxla/community#71

stellaraccident mentioned this issue Apr 1, 2023

Roll-up of changes needed to support the nvgpu out of tree project. iree-org/iree#12888

Merged

stellaraccident added a commit to iree-org/iree-experimental that referenced this issue Apr 1, 2023

Initial commit of openxla nvgpu project.

50e0e01

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

stellaraccident mentioned this issue Apr 1, 2023

Initial commit of openxla nvgpu project. iree-org/iree-experimental#118

Merged

stellaraccident added a commit to iree-org/iree-experimental that referenced this issue Apr 4, 2023

Initial commit of openxla nvgpu project. (#118)

dba12ab

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

GMNGeoffrey pushed a commit to iree-org/iree-nvgpu that referenced this issue Apr 4, 2023

Initial commit of openxla nvgpu project. (#118)

de8e342

Requires IREE changes from: iree-org/iree#12888 Progress on openxla/community#71

stellaraccident added a commit that referenced this issue May 25, 2023

Import two RFCs that were managed as issues.

bd17717

Both repositories have long been created and are their own top level modules now. Fixes #72 Fixes #71

stellaraccident mentioned this issue May 25, 2023

Import two RFCs that were managed as issues. #81

Merged

stellaraccident closed this as completed in #81 May 25, 2023

stellaraccident closed this as completed in f601b38 May 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC - Creating an openxla-nvgpu project #71

RFC - Creating an openxla-nvgpu project #71

stellaraccident commented Mar 28, 2023 •

edited

Loading

pjannaty commented Mar 28, 2023

pjannaty commented Mar 28, 2023

stellaraccident commented Mar 28, 2023

pjannaty commented Mar 28, 2023

mjsML commented Mar 28, 2023

stellaraccident commented Mar 28, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

joker-eph commented Mar 30, 2023

Proposed Directory Structure

stellaraccident commented Mar 30, 2023 •

edited

Loading

ezhulenev commented Mar 30, 2023

joker-eph commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

joker-eph commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 31, 2023

stellaraccident commented Mar 31, 2023

sherhut commented Mar 31, 2023

stellaraccident commented Mar 31, 2023

stellaraccident commented Apr 1, 2023

ezhulenev commented Apr 3, 2023

stellaraccident commented Apr 4, 2023

GMNGeoffrey commented Apr 4, 2023

ezhulenev commented Apr 4, 2023

RFC - Creating an openxla-nvgpu project #71

RFC - Creating an openxla-nvgpu project #71

Comments

stellaraccident commented Mar 28, 2023 • edited Loading

Proposed Directory Structure

Dependencies

Releases

Versioning and CI

Benchmarking

Next steps

Project Ownership

pjannaty commented Mar 28, 2023

pjannaty commented Mar 28, 2023

stellaraccident commented Mar 28, 2023

pjannaty commented Mar 28, 2023

mjsML commented Mar 28, 2023

stellaraccident commented Mar 28, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

joker-eph commented Mar 30, 2023

Proposed Directory Structure

stellaraccident commented Mar 30, 2023 • edited Loading

ezhulenev commented Mar 30, 2023

joker-eph commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

joker-eph commented Mar 30, 2023

stellaraccident commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

ezhulenev commented Mar 30, 2023

stellaraccident commented Mar 31, 2023

stellaraccident commented Mar 31, 2023

sherhut commented Mar 31, 2023

stellaraccident commented Mar 31, 2023

stellaraccident commented Apr 1, 2023

ezhulenev commented Apr 3, 2023

stellaraccident commented Apr 4, 2023

GMNGeoffrey commented Apr 4, 2023

ezhulenev commented Apr 4, 2023

stellaraccident commented Mar 28, 2023 •

edited

Loading

stellaraccident commented Mar 30, 2023 •

edited

Loading