Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement building with CMake #478

Closed
wants to merge 3 commits into from

Conversation

KyleHerndon
Copy link
Contributor

For context, I believe Anush (@powderluv) has already been in discussion with you on subject of our proposed contributions of a CMake build system and implementing an MLIR environment, where CMake is a soft but extremely useful requisite for the MLIR environment.

I've put out this pull request before finishing implementation and clean up on this commit to solicit feedback and to hopefully save some effort on our part (Nod.ai). These changes are not completely ready as is.

In our development of the CMake build system for CompilerGym, we've borrowed a system from another library that can be used to automatically (partially) migrate bazel BUILD files into equivalent CMakeLists. So, one major reason we're looking for feedback before finishing is to make sure this format (copying Bazel style into CMake instead of using idiomatic CMake) will be fine as doing so will save us a lot of effort.

Additionally, if there's any patterns that you know that you'd like us to avoid, especially ones that you see us using in this pull request, please inform us so that we can implement your style requests and hopefully save ourselves from rewriting this PR once or twice.

@facebook-github-bot
Copy link

Hi @KyleHerndon!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@fb.com. Thanks!

@ChrisCummins
Copy link
Contributor

ChrisCummins commented Nov 4, 2021

Hi @KyleHerndon, wow, this is quite the diff! :-) I'll have a quick skim through and leave any inline comments, though I should explain that I'm definitely no CMake expert and have limited experience in working with CMakeLists files.

At this early stage I'd like to better understand what the motivation and end goals of this new CMake build system are. Here's a few scattered questions. Feel free to address any and ignore those that are not relevant :)

  • Is the goal of adding CMake to overcome a limitation of the current Bazel build system?
  • Are you aiming to replace the existing Bazel build system, or to have two parallel build systems?
  • If CMake+Bazel are to remain parallel build systems, can the generation of CMake build config be automated?
  • If CMake is to replace Bazel, can CMake reach feature parity with the current build system? The goals of the current Bazel build are:
    1. To generate self-contained statically linked binaries.
    2. To provide a very explicit dependency graph that can be used by bazel test to ensure that incremental changes to the codbease requrie the minimum number of tests to be run.
    3. To produce a hermetic build (althouth this is not perfect! e.g.)

Cheers,
Chris

cmake.sh Outdated Show resolved Hide resolved
Comment on lines +1 to +9
################################################################################
# Autogenerated by build_tools/bazel_to_cmake/bazel_to_cmake.py from #
# ../CompilerGym/compiler_gym/BUILD #
# #
# Use cmake_cmake_extra_content from iree/build_defs.oss.bzl to add arbitrary #
# CMake-only content. #
# #
# To disable autogeneration for this file entirely, delete this header. #
################################################################################
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the plan is to keep CMake as a parallel build to Bazel, then I suggest that these files not be checked-in. Of course, if replacing Bazel then that is a different matter :-)

Cheers,
Chris

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file was auto-generated, but further edits have been made on top of it.

@KyleHerndon
Copy link
Contributor Author

Hi Chris,

Is the goal of adding CMake to overcome a limitation of the current Bazel build system?
Yes, the goal is to make it easier to work with the LLVM subproject MLIR. LLVM technically has some bazel support, but in practice, it is very fragile and our team wasn't even able to get it working with the MLIR subproject in particular. We would prefer to be able to directly build the project instead of having to link a prebuilt tar like what you're currently doing for LLVM (without MLIR).
Are you aiming to replace the existing Bazel build system, or to have two parallel build systems?
From our perspective, we only care about adding CMake, which makes it fairly natural to have parallel build systems, but if we achieve feature parity with
If CMake+Bazel are to remain parallel build systems, can the generation of CMake build config be automated?
Partial automation is definitely possible, and is how we generated many of these files. But, I think full automation is prohibitively difficult.
If CMake is to replace Bazel, can CMake reach feature parity with the current build system? The goals of the current Bazel build are:
To generate self-contained statically linked binaries. CMake can do this.
To provide a very explicit dependency graph that can be used by bazel test to ensure that incremental changes to the codebase requrie the minimum number of tests to be run. I don't think CMake would be quite as good at this, but it shouldn't be excessively difficult to achieve.
To produce a hermetic build (althouth this is not perfect! e.g.) I would expect CMake to be about as good

I think our team would be willing to get CMake to feature parity if you wanted to use CMake exclusively, and we could definitely setup a script that could be used to automatically generate as much of a CMake file as possible from the BUILD files.

Kyle

@ChrisCummins
Copy link
Contributor

Hi Kyle,

Thanks for the quick response and extra context! :) That helps clear things up.

Is the goal of adding CMake to overcome a limitation of the current Bazel build system? Yes, the goal is to make it easier to work with the LLVM subproject MLIR. LLVM technically has some bazel support, but in practice, it is very fragile and our team wasn't even able to get it working with the MLIR subproject in particular. We would prefer to be able to directly build the project instead of having to link a prebuilt tar like what you're currently doing for LLVM (without MLIR).

Ah I see, thanks for the explanation. So if we look at the main target of CompilerGym there are three classes of files: data, python, and binaries:

CompilerGym

(please excuse my crappy drawings 😄)

At the moment, bazel is used to do the heavy lifting of coordinating, whenever any one of those bits changes, what needs rebuilding/re-testing. It looks like your goal is to move that responsibility from bazel to cmake. I am totally open to the idea of adopting a new build system if there is some significant advantages, such as making it easier to extend CompilerGym / integrate with third party libraries.

However I just want to explore whether there is a lower-effort approach that you could try before putting too much work into your port :) Have you considered an alternate approach in which CMake is used to build your binary, handling all of the LLVM/MLIR side-of-things, which can then be treated as any other data file by the existing build system?

CompilerGym

Running the cmake build could be a totally ad-hoc step in the Makefile, or there is quite good support for integrating this type of use case directly in Bazel. I wonder if an approach like this could help you get something up and running without the effort having to achieve full feature parity with bazel test and without the maintenance burden of parallel build systems.

I think our team would be willing to get CMake to feature parity if you wanted to use CMake exclusively, and we could definitely setup a script that could be used to automatically generate as much of a CMake file as possible from the BUILD files.

If you want to purse then I think the goal should be to totally replace and remove all of the existing BUILD files. It would be worth us coming up with a couple of small example use cases that you could begin by implementing which would make it easier to check that it is on course for achieving the right goals.

Cheers,
Chris

@sogartar
Copy link

sogartar commented Nov 9, 2021

  1. To provide a very explicit dependency graph that can be used by bazel test to ensure that incremental changes to the codbease requrie the minimum number of tests to be run.

This means that we can't use CMake tests. They must be build targets instead.

  1. To produce a hermetic build (althouth this is not perfect! e.g.)

CMake is not a hermetic build system.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Nov 9, 2021
@facebook-github-bot
Copy link

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Facebook open source project. Thanks!

@KyleHerndon
Copy link
Contributor Author

Hi Chris,

Thanks for the lovely pictures, quite informative!

I and others on my team have spent some time trying to get bazel working with LLVM/cmake but it's been a deep fruitless rabbit hole. I see from your bazel_llvm repo that you've had some trouble with it yourself too.

I understand your concerns regarding hermiticity but I'm concerned it might not be viable while trying to get LLVM building properly from source, at least, not without some effort from LLVM to solidify their bazel build (which I don't see happening soon). As @sogartar mentioned, and I previously forgot, hermiticity is difficult to achieve in CMake, (though in practice I think you can get it looking pretty close). For this reason though, I don't think pursuing a complete replacement of bazel with CMake is likely to be desirable to you.

It would be good to clarify though, would you object to having parallel build systems? A lot of compiler libraries, especially those based on C/C++ are more likely to use CMake instead of bazel, in my experience. I think parallel build systems could make the ease of extensions to CompilerGym easier in general regardless of just our use case.

That said, a compromise that I think would probably match both our interests well would be to autogenerate the CMake from bazel for nearly everything, and just have some special annotations in the BUILD files or separate files that specify any additional information when absolutely necessary. I expect these cases are very small in number, besides at the top level directory and to define the special bazel to CMake analog functions. Also, I have a concrete idea of how to do this already.

Regards,
Kyle

@codecov-commenter
Copy link

codecov-commenter commented Nov 15, 2021

Codecov Report

Merging #478 (0186414) into development (9987fbd) will decrease coverage by 0.25%.
The diff coverage is n/a.

Impacted file tree graph

@@               Coverage Diff               @@
##           development     #478      +/-   ##
===============================================
- Coverage        87.75%   87.49%   -0.26%     
===============================================
  Files              111      111              
  Lines             6181     6181              
===============================================
- Hits              5424     5408      -16     
- Misses             757      773      +16     
Impacted Files Coverage Δ
...er_gym/third_party/inst2vec/inst2vec_preprocess.py 70.00% <0.00%> (-4.24%) ⬇️
compiler_gym/envs/llvm/compute_observation.py 87.23% <0.00%> (-2.13%) ⬇️
...loop_tool/service/loop_tool_compilation_session.py 87.24% <0.00%> (-1.35%) ⬇️
compiler_gym/envs/llvm/datasets/cbench.py 76.69% <0.00%> (-0.76%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9987fbd...0186414. Read the comment docs.

@ChrisCummins
Copy link
Contributor

Hi @KyleHerndon and @sogartar,

Sorry for the slow response!

It would be good to clarify though, would you object to having parallel build systems? A lot of compiler libraries, especially those based on C/C++ are more likely to use CMake instead of bazel, in my experience. I think parallel build systems could make the ease of extensions to CompilerGym easier in general regardless of just our use case.

No, I wouldn't object to parallel build systems. I was just suggesting whether there was a way to save you guys some effort. I will happily merge a cmake build if it enables you to do your work and makes it easier for others to extend CompilerGym. :-)

Let's plan on parallel builds for now. Longer term I would be interested in seeing if we can address the hemeticity / testing issues so that cmake can fully replace bazel, that way there isn't the burden of maintaining two builds.

Cheers,
Chris

@ChrisCummins
Copy link
Contributor

BTW I turned on the CI for your pull request so you can check your new CI/build-linux-cmake job:

CleanShot 2021-11-15 at 14 38 50@2x

@GMNGeoffrey
Copy link

Commenting here because of iree-org/iree#7736

@KyleHerndon
LLVM technically has some bazel support, but in practice, it is very fragile and our team wasn't even able to get it working with the MLIR subproject in particular.

What issues have you run into, specifically? In https://github.com/google/iree we build LLVM with the upstream Bazel build. In fact, I was the one who went on quite a large digression from my normal work to unify the various incarnations of a Bazel bulid of LLVM upstream. I'd be happy to work with you on issues you're running into.

If CMake+Bazel are to remain parallel build systems, can the generation of CMake build config be automated?
Partial automation is definitely possible, and is how we generated many of these files. But, I think full automation is prohibitively difficult.

Again, in IREE, we generate our CMake build files from Bazel ones. This works well because we control both the Bazel and CMake side of things. I think that it has actually improved our build configurations because it adds constraints that prevent us from doing anything too creative 😛 Generating CMake from Bazel in full generality would be much trickier, I think. https://github.com/google/bazel-to-cmake started out trying to do this and didn't make it very far, which is why we went with our approach, which basically just translates BUILD files into CMakeLists.txt, where we have CMake functions that mirror Bazel rules and macros.

To provide a very explicit dependency graph that can be used by bazel test to ensure that incremental changes to the codebase requrie the minimum number of tests to be run. I don't think CMake would be quite as good at this, but it shouldn't be excessively difficult to achieve.

I think it would actually be quite difficult to do this fully in CMake. Perfect dependency tracking is really the main value proposition of Bazel (I also much prefer the language of BUILD files to CMake, but that's comparatively minor). If you have things nicely divided by directories where there are no cross dependencies, you can manually divide what gets built.

To produce a hermetic build (althouth this is not perfect! e.g.) I would expect CMake to be about as good

I would expect it to be much worse. Again this is Bazel's main value prop. Having used both, Bazel definitely wins on this one. CMake wins some other things that I think make it overall more practical to just build your damn project (like an actually-usable if statement), but it wasn't built with this goal and is not very good at achieving it.

@ChrisCummins
Running the cmake build could be a totally ad-hoc step in the Makefile, or there is quite good support for integrating this type of use case directly in Bazel

Curious whether you've actually used this. I haven't tried it myself, but know some folks who did not have success with it. I'm not sure how much it's improved in recent years. Maybe we should take another look. More generally though, I've found that every time someone constructs a build-system-with-a-build-system it has ended up being something difficult to maintain that everyone hates.

@ChrisCummins
Copy link
Contributor

ChrisCummins commented Nov 29, 2021

Thanks for weighing in on this @GMNGeoffrey!

Again, in IREE, we generate our CMake build files from Bazel ones. This works well because we control both the Bazel and CMake side of things. I think that it has actually improved our build configurations because it adds constraints that prevent us from doing anything too creative 😛 Generating CMake from Bazel in full generality would be much trickier, I think. https://github.com/google/bazel-to-cmake started out trying to do this and didn't make it very far, which is why we went with our approach, which basically just translates BUILD files into CMakeLists.txt, where we have CMake functions that mirror Bazel rules and macros.

I'd be fine with using a restricted subset of Bazel for CompilerGym. The important rules are cc_binary, py_{binary,library,test}, and the protobuf/gRPC library rules. There's a couple of bits of funky genrule stuff (cf #488) but the build could be reworked to be less "creative" :)

@ChrisCummins
Running the cmake build could be a totally ad-hoc step in the Makefile, or there is quite good support for integrating this type of use case directly in Bazel

Curious whether you've actually used this. I haven't tried it myself, but know some folks who did not have success with it. I'm not sure how much it's improved in recent years. Maybe we should take another look. More generally though, I've found that every time someone constructs a build-system-with-a-build-system it has ended up being something difficult to maintain that everyone hates.

I have had some success with external CMake/Makefile rules for "simple" cases like:

configure_make(
name = "csmith",
binaries = ["csmith"],
configure_env_vars = {
# Workaround error with libtool usage on macOS. See:
# https://github.com/bazelbuild/rules_foreign_cc/issues/185
"AR": "/usr/bin/ar",
# Csmith uses decreated stdlib functions like std::bind2nd().
"CXXFLAGS": "-D_LIBCPP_ENABLE_CXX17_REMOVED_FEATURES",
},
# Workaround a strange bug where the srand48_deterministic test returns
# true on macOS, although this only available and needed for OpenBSD.
configure_options = ["ac_cv_func_srand48_deterministic=no"],
lib_source = "@csmith//:all",
)

It's far from perfect but often easier than writing a set of BUILD files for it. Before the LLVM upstream bazel support I tried (very) unsuccessfully to build LLVM as an external CMake target here.

I think @kylescotshank makes a good point about CMake being widely adopted by existing compiler libraries, so I think there is a good case for CMake support aside from just the LLVM dependency.

Cheers,
Chris

@GMNGeoffrey
Copy link

I'd be fine with using a restricted subset of Bazel for CompilerGym. The important rules are cc_binary, py_{binary,library,test}, and the protobuf/gRPC library rules. There's a couple of bits of funky genrule stuff (cf #488) but the build could be reworked to be less "creative" :)

If you'd like to fork our Bazel->CMake stuff and give it a try here, I'm all for that :-) Or if you want to try out making it more general. I think we don't want a general solution to be developed within the IREE repo, but perhaps if we have a few projects using similar approaches we could narrow in on a good general/maintainable trade-off. Right now we definitely do some things for expediency there that we'd probably like to continue doing without having to worry about affecting other projects (we're trying our best not to become a build-system project 😛 )

It's far from perfect but often easier than writing a set of BUILD files for it. Before the LLVM upstream bazel support I tried (very) unsuccessfully to build LLVM as an external CMake target here.

Ah yeah. I think we were looking specifically at LLVM. I suspect some less complicated projects wouldn't be too bad

@KyleHerndon
Copy link
Contributor Author

Deprecated by #498

@ChrisCummins
Copy link
Contributor

Deprecated by #498

Closing :-)

This was referenced Jan 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants