Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use the v2 engine to implement bazel skyframe evaluation to use rulesets ~verbatim #7093

Closed
3 of 6 tasks
cosmicexplorer opened this issue Jan 17, 2019 · 4 comments
Closed
3 of 6 tasks

Comments

@cosmicexplorer
Copy link
Contributor

Based off of some thoughts that I could have been a lot nicer about expressing in #6998.

skyframe background

analogies between skyframe and the v2 engine

As mentioned in the comment linked at the top, there is an instance of bazel doing far too much work due to the current implementation of "Label resolution", which can restart from scratch as many times as necessary to resolve its dependencies. This can be thought of as a mockup of coroutines, but losing all of the progress from the start of a SkyFunction each time this is performed. In the comment linked at the top, I noted that a coroutine-based evaluation method is proposed in that bazel issue as one method of resolving this unreliable performance and incomplete caching, and that #5580 was an example of a successful coroutine-based evaluation mechanism which is made extremely natural in @rule bodies in the pants v2 engine (see #7023 for an example of syntax ergonomics).

I also noted in the comment linked at the top in response to concerns about effectively sandboxing python code:

one alternative is to then mark "untrusted" rules as uncacheable

This too appears to have an analogue in FunctionHermeticity in bazel.

TODO: list more analogies in skyframe (or other parts of bazel) to existing or easily-implementable concepts in the v2 engine!

motivation for this particular correspondence

I was surprised (but then not that surprised) that there seems to be a very strong correspondence between the pants v2 engine and what skyframe offers (although the v2 engine seems strictly more powerful, which is why this is feasible at all). I was initially thinking of consuming starlark files and allowing the production of some sort of "ffi" to pants from bazel rulesets, but after realizing the more direct correspondence was to the lower-level skyframe, I was thinking that being able to plug in the v2 engine as the backend instead of skyframe would potentially allow consuming bazel rules in pants directly without having to handle the changing starlark API surface. If we can develop a translation layer for skyframe objects in the v2 engine, especially without having to dip into the rust codebase (but that would be fine), we could very feasibly implement a large percentage of bazel primitives in v2 @rules, which would let us consume bazel rulesets without changes. Given that bazel has marketing and therefore some level of developer mindshare, this is immediately a tantalizing prospect given the relative size of the pants contributor base.

Additionally, given how as of #5580 we have a complete coroutine-based execution model orchestrated by our rust codebase, I strongly expect we may be able to demonstrate a performance improvement over bazel for operations which haven't been deeply optimized in other ways (again see the label resolution pathological performance issue in skyframe). Only siths speak in absolutes, but it is generally easier to make rust efficient than java, so we may be able to see performance improvements on a micro level as well.

However, the goal of this ticket (or my goal) is not to create an engine that would be useful for bazel, although that would be a wonderful secondary effect that I would love to help with but not get too distracted by. Rather, I personally would prefer to look at what we can take from this successful competitor project and shamelessly steal for ourselves. There are some very nice benefits to using starlark for BUILD files, for example.

TODO: list more nice parts of bazel that we can steal!

implementation

  • Identify the surface in the bazel codebase to cut at to drop in the v2 engine!
  • Identify the correspondence between bazel objects and v2 rules!
    • tentatively implement an @ruleset in python which can resolve skyfunctions/skyvalues/etc
      • with a smattering of rust edits applied as necessary
  • Identify a case study to iterate on!
    • tentatively the bazel kotlin ruleset
    • kotlin is an example of a JVM language (which pants supports extremely well) which people like to use and that we don't currently have any support for
    • this ruleset seems ~maintained enough to not run into too many roadblocks and to potentially be useful to rely on to present to pants users!
  • Hack together a prototype of depending on .bzl files vendored in from a bazel ruleset to do "something that the ruleset is already able to do"!
    • implement enough bazel primitives in order to build some kotlin code into a hello world jvm_binary() (or whatever the bazel equivalent is)
    • no interaction with other pants code yet
    • forking the ruleset to insert temporary hacks is allowed at this stage
    • the intention here is to use verbatim the code from bazel which converts starlark code into skyframe objects and to identify the minimum covering set of concepts necessary to implement in order to get this prototype going
      • see ASTFileLookupFunction
      • this would likely require some nontrivial JVM bridge code (scala, ideally), so implicit in this goal is to determine how to efficiently and cleanly bridge the bazel codebase with pants rust and python. this remains an unknown, but not an unknown unknown
  • Consume a bazel ruleset to interact with existing pants targets!
    • make a jvm_binary() which depends on kotlin code and scala code
      • at this point, the skyframe emulation need not interact with other pants @rules yet except at a basic level
      • e.g. the kotlin ruleset emulation generates its own jar, which pants can consume in some hacky way to get a v1 of what a mixed codebase might look like
    • this goal can be pushed back compared to deeper @rule integration
  • Develop an injective correspondence from skyframe to v2 @rules!
    • the real desired end goal would be to be able to consume bazel rulesets without changes
      • the reason this would be possible without my wrists going out for real would be direct dependence on bazel code
      • as always, applying hacks liberally until we can achieve this

In #6891 we describe a scheme to bootstrap pants with a previous pants pex in order to implement @rules to bootstrap our own rust/cargo codebase -- we could apply the same idea to feasibly use bazel rulesets (at the conclusion of the above) to build any of our own code.

@illicitonion
Copy link
Contributor

If we're going to do this, I would slightly reframe it... The goal here is to be able to use the bazel rules API, and I think skyframe is a significant distraction from that. It's a java API which doesn't have a stable specification, and is subject to unlimited change because it's not a public API to anyone. I don't want to embed a java interpreter in pants, and try to re-implement a non-stable API...

I think the goal here should probably be to make the rule function from https://docs.bazel.build/versions/master/skylark/rules.html work as a way of registering new Target classes, and new @rule implementations.

However, I think this is blocked on:

  1. Working out what the Target API should be in v2 (Design v2 Target API #4535)
  2. Working out how plugins are going to register rules to handle existing goals (i.e. how as a plugin author I can make new clauses appear in
    @rule(TestResult, [Select(HydratedTarget)])
    def coordinator_of_tests(target):
    # This should do an instance match, or canonicalise the adaptor type, or something
    #if isinstance(target.adaptor, PythonTestsAdaptor):
    # See https://github.com/pantsbuild/pants/issues/4535
    if target.adaptor.type_alias == 'python_tests':
    result = yield Get(PyTestResult, HydratedTarget, target)
    yield TestResult(status=result.status, stdout=result.stdout)
    else:
    raise Exception("Didn't know how to run tests for type {}".format(target.adaptor.type_alias))
    - this is very tied to Design v2 Target API #4535)

These are both super important things we need to do regardless, but I don't think we can reasonably implement these things by delegating to someone else's API unless either a) we've worked out our own API, or b) we decide that the other API should be the way it's done.

@cosmicexplorer
Copy link
Contributor Author

I think the goal here should probably be to make the rule function from https://docs.bazel.build/versions/master/skylark/rules.html work as a way of registering new Target classes, and new @rule implementations.

I didn't have this link on my radar! This sounds like a correspondence that is more stable and makes more sense (and more importantly, would make sense to a pants user trying to understand how to use bazel rules)! I will edit the issue description when I can dive into this interface and reframe the problem description (asap).

  1. Working out what the Target API should be in v2 (Design v2 Target API #4535)

I haven't had an excuse before to actually do this, so I would love to look into this!

  1. Working out how plugins are going to register rules to handle existing goals

Thanks for the very precise snippet here! This also seems tied to #5933, which @stuhood already noted in that issue (so solving #5933 could be a first step towards any of this as well).

These are both super important things we need to do regardless, but I don't think we can reasonably implement these things by delegating to someone else's API unless either a) we've worked out our own API, or b) we decide that the other API should be the way it's done.

I'm less clear on this part, but getting more clear as I type. To me, looking at https://docs.bazel.build/versions/master/skylark/rules.html, the ctx object seems like a more condensed, untyped (?), non-coroutine version of Select() and yield Get(). I need to dive into extern_generator_send() and its inverse and its usages to see "what happens when I run x = yield Get(...)", but I'm thinking along the lines of converting pretty much everything in https://docs.bazel.build/versions/master/skylark/lib/ctx.html into a method that sends something to the engine like we do in an x = yield Get(...) statement. This (vague) approach seems like it could slot in nicely and almost magically avoid the restarting issue with skyframe by using the engine's coroutines.

note: using starlark is ok

The reason I wanted to look at skyframe specifically in the first place is because starlark is a subset of python, but I'm not clear if it's going to always be that way (although I can see why it might be useful for starlark to remain a python subset). My concern was that trying to apply the python interpreter to use starlark might run into a growing number of incongruencies that require progressively more translations if it strays away from python. However, now, noting that there are multiple implementations of the starlark parser and an official spec, I think my concern about it evolving in strange undocumented ways can be 100% put to rest. In the absolute "worst case", we can consume the output of a starlark parser (which could be considered a "python starlark parser", as funny as that sounds right now), somewhat like #6998.

@cosmicexplorer
Copy link
Contributor Author

Note that some discussion about the the build API in Bazel and separating it is detailed in https://docs.google.com/document/d/1UDEpjP_qWQRYsPRvx7TOsdB8J4o5khfhzGcWplW7zzI/mobilebasic.

@Eric-Arellano
Copy link
Contributor

As I understand, this project is no longer being pursued.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants