Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Not for merging] Basic starlark parser for BUILD files #6998

Closed

Conversation

illicitonion
Copy link
Contributor

This is super naive, and performs accordingly. What it does:

  1. Register functions for all symbols exposed to BUILD files which
    record that they were called (and their arguments).
  2. Passes a big python list of these calls up.
  3. Executes each of those calls (potentially multiple times), and
    records which target-like functions were called.

This successfully parses all of pants's own BUILD files, with a few
exceptions:

  1. Starlark currently doesn't support set literals. This should be easy
    to add behind a feature.
  2. This naive way of performing parsing means that any functions which
    are called don't return real objects, and so return values can't be
    operated on. Exactly one BUILD file in the repo called a function on
    the result of a function, so I hacked around it for now.
  3. Starlark doesn't support implicit string concatenation. Exactly one
    BUILD file in pants uses implicit string concatenation, so I switched
    it to explicit. I strongly support banning implicit string concatenation,
    as it has the scope for a lot of mistakes.

Performance is currently about half of the existing parser. I haven't
profiled, but my assumption is that this is mostly because we've doubled
the object construction overhead (object construction in Python is
mostly dict-creation, and we're now creating an intermediate dict of
things to hydrate, and then using those to create the actual objects).

If we wanted to take this and turn it into something real, the first
change I'd probably make would be to delay both layers of python object
construction until they're really needed:

  1. Reuse a shared parent Environment with functions pre-loaded, rather
    than making fresh ones from scratch every time.
  2. Don't pass all of the starlark internal state over to Python (which
    acquires the GIL per object), instead store it in-memory in Rust, and
    only pass things over to Python when they're needed. This way, things
    like list :: could mostly avoid the GIL, just passing the strings
    of names up to Python, and things like
    list --changed-parent=master :: could just pass up the strings of
    names, and the Snapshots, rather than more complex objects.
  3. Add support for datatype-like structs to starlark, so that the
    non-Target objects we create can be natively created in starlark,
    rather than mirrored as pseudo-functions on the Python side.
  4. Tweak the Starlark API so that we don't need to move values around
    quite as much. Currently a function which returns a String goes:
    String -> starlark Value -> parsing Value -> ffi String -> Python String
    most of those involve moves, and surprisingly many involve copies.
    These should mostly be avoidable with some API tweaks.
  5. Tweak the Starlark API so that object type checks can simply be
    dynamic downcasts, rather than require a string comparisons with
    Value.get_type.
  6. Start writing rules as Starlark instead of Python, so they can just
    be natively interpreted by the Starlark evaluator.

@cosmicexplorer
Copy link
Contributor

Register functions for all symbols exposed to BUILD files which record that they were called (and their arguments).

Perhaps not the recording part, but the bit about having a complete list of symbols which are exposed to BUILD files in a structured way sounds extremely useful for (1) moving towards a more structured form of docsite generation (2) an emacs/intellij/vs code editing mode which is aware of and can complete BUILD file symbols -- both of which I've been planning to work on recently. Having this example usage here is helpful.

Starlark currently doesn't support set literals. This should be easy to add behind a feature.

I always thought this was particularly annoying.

I strongly support banning implicit string concatenation, as it has the scope for a lot of mistakes.

This sounds great to me, but I don't quite know what implicit string concatenation refers to here, but I didn't look too hard through the diff.

Start writing rules as Starlark instead of Python, so they can just be natively interpreted by the Starlark evaluator.

I'm assuming this refers to writing Starlark rules in Starlark, not all of the v2 engine rules -- just making sure. I looked through the diff with all the time I have right now and it's not quite clear -- does this particular diff allow for interop of v2 rules with Starlark rules? Where are the first few blockers you can see for allowing that level of deep integration?

@illicitonion
Copy link
Contributor Author

Starlark currently doesn't support set literals. This should be easy to add behind a feature.

I always thought this was particularly annoying.

There's definitely a weirdness in our use of set literals in Pants. For instance, the dependencies attribute is strictly required both to have no duplicates, and to be a MutableSequence (so not a set or a tuple). I can get behind only having one 1-d collection type for consistency. That said, adding sets to starlark should be pretty trivial; basically just copy and paste the list code and use an IndexSet instead of a Vec as the backing store :)

I strongly support banning implicit string concatenation, as it has the scope for a lot of mistakes.

This sounds great to me, but I don't quite know what implicit string concatenation refers to here, but I didn't look too hard through the diff.

In a BUILD file (and in fact any python) today you can write:

python_library(
  name = "foo",
  dependencies = [
    "some:target"
    "some/other:target"
  ],
)

which, because you missed your commas in your list will give you a dependency on some:targetsome/other:target. Thanks Python.

In Starlark, this simply doesn't parse.

Start writing rules as Starlark instead of Python, so they can just be natively interpreted by the Starlark evaluator.

I'm assuming this refers to writing Starlark rules in Starlark, not all of the v2 engine rules -- just making sure. I looked through the diff with all the time I have right now and it's not quite clear -- does this particular diff allow for interop of v2 rules with Starlark rules? Where are the first few blockers you can see for allowing that level of deep integration?

Note: I'm not advocating for any particular direction here, I just put together an experiment and am using on where we'd go if we wanted to turn it into a real thing.

This is referring to v2 rules actually being Starlark, not Python, and feeding them into the Starlark environment as registered functions.

This has some advantages and disadvantages. Off the top of my head:
Advantages:

  • Things are deterministic by default. Don't need to worry about things like dict iteration order, because we just guarantee that there's no non-determinism available to the language.
  • Reliably sandboxed execution. Don't need to worry about things like "Does this rule access the network" or "Does this rule generate random numbers" or "Does this rule read files using ways we don't track"; again, the standard library we expose to rules is known-good. (Note that actually sandboxing Python is really, really hard).
  • No GIL - we can parallelise all stages of execution limited only by actual data dependencies, rather than arbitrary Python threading limitations.
  • (Much bigger step, much less likely): Possibility to evolve syntax in our own directions. e.g. rather than using @rule decorators whose arguments happen to mirror type parameters in a function signature, we could introduce a syntax like:
def hydrate_target(target_adaptor: TargetAdaptor): HydratedTarget:
  ...

similarly for things like @datatypes, rather than being limited by Python's existing syntax (so using a list of tuples where position matters), we could produce something a lot more clear and expressive:

datatype Foo:
  name="default_name": str
  values=(): tuple<int>

Disadvantages:

  • Reduced set of existing libraries that can be relied on. Want access to a graph algorithm? Guess you're going to write that graph algorithm from scratch.
  • Reduced set of language tooling. Poor IDE support, limited debuggers, limited testing support, ... - there are folks working on all of these things, but it's never going to be as broadly supported as Python proper.
  • Familiarity/learnability. Python has documentation, tutorials, a lot of StackOverflow questions, .... Starlark does not.
  • Won't benefit from future advances in Python.

Copy link
Member

@stuhood stuhood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks really nice: kudos!


There is one critical bit from the description that I think I should clarify:

Start writing rules as Starlark instead of Python, so they can just be natively interpreted by the Starlark evaluator.

It's not actually "rules" (@rules) that would need to be written in Starlark in order for us to fully evaluate BUILD files in Starlark. Instead, it would be all macros, context_aware_object_factories, and objects. The most numerous source of these symbols is:

def build_file_aliases():
return BuildFileAliases(
targets={
'alias': AliasTargetFactory(),
'files': Files,
'prep_command': PrepCommand,
'resources': Resources,
'remote_sources': RemoteSources,
'target': Target,
},
objects={
'get_buildroot': get_buildroot,
'netrc': Netrc,
'pants_version': pants_version,
},
context_aware_object_factories={
'buildfile_path': BuildFilePath,
'globs': Globs,
'intransitive': IntransitiveDependencyFactory,
'provided': ProvidedDependencyFactory,
'rglobs': RGlobs,
'scoped': ScopedDependencyFactory,
'zglobs': ZGlobs,
}
)

This is probably a smaller problem than evaluating @rules in Starlark. But it's not quite a subset of evaluating @rules... for each of the symbols we expose, we'd need to decide whether to:

  1. port the symbol to starlark (most macros and context_aware_object_factories, and some objects)
  2. expose a starlark-builtin/rust-intrinsic as a replacement
  3. port the symbol to a Target type (likely the case for netrc_credentials, etc) and deprecate the current method of construction

@stuhood
Copy link
Member

stuhood commented Jan 2, 2019

With regards to a path toward landing this: I think that it might be useful to ship this behind an experimental flag that will skip/warn/fail for parsing in starlark. But I don't expect that we'd want to turn it on by default until after we had starlark ports for all of the symbols, because that would allow removing all python evaluation (removing complexity), and have the result of the starlark eval be a series of (Unhydrated)Target kwargs, ready for consumption by @rules.

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jan 15, 2019

Thanks a lot for the thorough rationale! I did want to say one thing: my comment that I thought the lack of set literals in starlark was particularly annoying wasn't assuming it would be at all hard to implement -- just that it seems like something that should have occurred already and i wasn't sure why nobody had done so. The rest of your post was super helpful, I'm just deeply concerned about lock-in if we hitch ourselves to a language and framework that forces us to wait for features like set literals to trickle down from on high. I also am confused about:

Things are deterministic by default.

I don't quite see how starlark addresses the inherent nondeterminism in interacting with the outside world (a very very simple point, but I think a valid one).

Reliably sandboxed execution.

I understand why this is a feature -- how is this useful for us? Reliably sandboxing python is indeed hard, and I think the approach of the v2 engine isn't to try to put python in a cage, but to provide some ladders whereby rule writers can make use of the typed, dependency-injected structure to get free caching, parallelism, etc, in an incremental way which can interact with existing python code.

No GIL - we can parallelise all stages of execution limited only by actual data dependencies.

The fact that we have intrinsic rules to parallelize file ingestion and process execution already seems to imply that when parallelism is desired in v2 rules, we immediately have the option to drop into a mature rust codebase built from scratch to adopt new modes of parallelism, which seems infinitely superior to...whatever executes starlark rules (and familiarity with the execution framework might be something you personally have, but I don't, and I don't understand why skyframe is necessarily better than the rule graph). Again, we have the basic layer of python code which people can write without too much thought, and we have the ability to extend that to offer additional guarantees about performance or caching. At least, that's the way I view it.

Possibility to evolve syntax in our own directions

I believe #7019 and #7023, along with the truly classic #5580, and the well-documented ability to even transform ASTs in the ast library, demonstrate that we aren't bound by syntax, and can even maintain compatibility with python while extending it. Separately, type hints with mypy already demonstrate the possibility of even introducing e.g. stub files which are not python, which can interact with python, and do the job of adding whatever sprinkles on top we might want (the idea being that we could use similar stub files if we wanted to extend python syntax in a truly incompatible way, and that this existing pattern means it wouldn't be incredibly unfamiliar to existing python programmers). If we're considering the prospect of evolving syntax in our own directions, we have examples of that working right now in the v2 engine.

This also leads to me respond to:

which, because you missed your commas in your list will give you a dependency on some:targetsome/other:target. Thanks Python.
In Starlark, this simply doesn't parse.

We already scan BUILD files for import statements -- I don't see why we can't enforce further restrictions on top of that without having to rip out the backend, and I don't see why that would be easier than doing this in python (which, again, has a fantastically hackable ast library).

Lastly:

Disadvantages:

I would personally add "lock-in" at the top here. It's not clear to me that bazel and its subprojects like starlark are going to be supported indefinitely. There are examples of specific design choices which restrict really nice features like dep inference to being available only inside google, e.g. C++ #include scanning, which was mentioned by Ulf as likely not going to be open sourced for a very long time, in public, at a scala meetup at twitter which he showed up and gave a completely unscheduled talk at. "no external libraries" hints at that, but doesn't quite get at the source of the issue.

One of the nice things about pants being written in python is that people are able to come into it and start hacking away immediately, because it's a well-known programming language with lots of documentation. We immediately lose that if the only alternative is starlark rules. I'm not intending to express vitriol here, but I personally very strongly don't like the idea of writing code in a fake language controlled by the company whose open source policies define the phrase "thrown over the wall".

So to mitigate that, I'm looking into the difficulty of implementing starlark with v2 rules.

@illicitonion
Copy link
Contributor Author

I think I was a bit sloppy with language, which has led to some confusion here - sorry for that! There are three different ideas of starlark integration I can imagine, which are pretty much entirely independent:

  1. Parse BUILD files ~verbatim with starlark instead of python. That's what this PR does.
  2. Parse our existing v2 rules ~verbatim with starlark instead of python, otherwise not changing the rules API at all. I mentioned in my original PR description that this could be useful to help with 1, but @stuhood rightly pointed out that they're actually ~entirely decoupled.
  3. Use the bazel rules API to define rules (instead of, or as well as, v2 rules).

I am only seriously talking about 1, and my post above was commenting on advantages and disadvantages of 2. None of this is talking about 3.

Thanks a lot for the thorough rationale! I did want to say one thing: my comment that I thought the lack of set literals in starlark was particularly annoying wasn't assuming it would be at all hard to implement -- just that it seems like something that should have occurred already and i wasn't sure why nobody had done so. The rest of your post was super helpful, I'm just deeply concerned about lock-in if we hitch ourselves to a language and framework that forces us to wait for features like set literals to trickle down from on high.

I'm not sure where "waiting for features to trickle down from on high" comes from as a concern; possibly because 1 and 2 are being conflated with 3 here? Starlark currently has three implementations, and a specification, all of which are open source, and the PRs I needed to get this prototype working were all quickly accepted; since then, I was kindly given commit rights to the starlark-rust repo, and I recently put out a PR adding set support behind a feature which I have every reason to believe will be merged.

I also am confused about:

Things are deterministic by default.

I don't quite see how starlark addresses the inherent nondeterminism in interacting with the outside world (a very very simple point, but I think a valid one).

Starlark's standard library doesn't have any APIs for interacting with the outside world. That's a large part of the reason it exists. There's no way to make a syscall, open a file, make a network connection, get a random number, etc.

Sources of non-determinism need to either be implemented as custom library functions in rust, and for BUILD file parsing we either wouldn't do that, or would be very careful to hook that into any caching mechanisms we need.

Reliably sandboxed execution.

I understand why this is a feature -- how is this useful for us? Reliably sandboxing python is indeed hard, and I think the approach of the v2 engine isn't to try to put python in a cage, but to provide some ladders whereby rule writers can make use of the typed, dependency-injected structure to get free caching, parallelism, etc, in an incremental way which can interact with existing python code.

It's really, really easy to do bad things that subvert caching or correctness in v2 rules. Example:

@rule(Something, [])
def do_something():
  some_path = os.path.join("/tmp", random.randint(0, 10))
  response = requests.get("https://www.google.com")
  with open(some_path, "w") as f:
    f.write("{}".format(datetime.now()))
    f.write(response.text)
  yield Something(some_path)

This rule is really problematic. If google.com changes, we won't rerun it. If your restart pantsd, it will re-fetch google.com. If google.com hasn't changed, the output still changes. When it reruns it may change its output path at random, invalidating downstream rules. But if the file name didn't happen to change at random, it won't invalidate downstream rules. The point of sandboxing is to guide people towards the safe ways we have of doing things - we have an API for fetching things over HTTP which plays nicely with the caches, but I expect anyone not working with pants daily will use requests to fetch things from the internet instead of our API, because it's the obvious and easy thing to do. And I think that will be true no matter how much documentation we do of our API.

Worse, people can do this in BUILD files right now, which will cause us to not re-parse them even though they depend on external state which may have changed.

No GIL - we can parallelise all stages of execution limited only by actual data dependencies.

The fact that we have intrinsic rules to parallelize file ingestion and process execution already seems to imply that when parallelism is desired in v2 rules, we immediately have the option to drop into a mature rust codebase built from scratch to adopt new modes of parallelism, which seems infinitely superior to...whatever executes starlark rules (and familiarity with the execution framework might be something you personally have, but I don't, and I don't understand why skyframe is necessarily better than the rule graph). Again, we have the basic layer of python code which people can write without too much thought, and we have the ability to extend that to offer additional guarantees about performance or caching. At least, that's the way I view it.

I think this concern is around conflating "using the bazel rules API in starlark" with using starlark as a parser. That's definitely not a thing I'm advocating for or trying to discuss here.

Possibility to evolve syntax in our own directions

I believe #7019 and #7023, along with the truly classic #5580, and the well-documented ability to even transform ASTs in the ast library, demonstrate that we aren't bound by syntax, and can even maintain compatibility with python while extending it. Separately, type hints with mypy already demonstrate the possibility of even introducing e.g. stub files which are not python, which can interact with python, and do the job of adding whatever sprinkles on top we might want (the idea being that we could use similar stub files if we wanted to extend python syntax in a truly incompatible way, and that this existing pattern means it wouldn't be incredibly unfamiliar to existing python programmers). If we're considering the prospect of evolving syntax in our own directions, we have examples of that working right now in the v2 engine.

This also leads to me respond to:

which, because you missed your commas in your list will give you a dependency on some:targetsome/other:target. Thanks Python.
In Starlark, this simply doesn't parse.

We already scan BUILD files for import statements -- I don't see why we can't enforce further restrictions on top of that without having to rip out the backend, and I don't see why that would be easier than doing this in python (which, again, has a fantastically hackable ast library).

First off, it's important to note that again, this isn't discussing changing a backend anywhere, just what's being used to parse (and evaluate) literal syntax - this isn't a discussion about the rule API itself.

The import detection is both slow, and easy to circumvent - it adds a noticeable amount of time to BUILD file parsing, and there are sneaky other ways you can find to import things (some modules squirrel away references to __import__, for instance. And we don't do anything about eval). In general, it's much faster, and safer, to not allow things in the first place, than to try to detect every usage of it and forbid it. If starlark didn't already exist, I'd agree that trying to make a language from scratch to avoid these problems would be massively more work than it's worth, but given it does exist, that changes the equation.

Lastly:

Disadvantages:

I would personally add "lock-in" at the top here. It's not clear to me that bazel and its subprojects like starlark are going to be supported indefinitely.

It's definitely a concern, but for this scope, I think it's reasonably addressed. I've already been made a committer to the new dependency, for instance.

There are examples of specific design choices which restrict really nice features like dep inference to being available only inside google, e.g. C++ #include scanning, which was mentioned by Ulf as likely not going to be open sourced for a very long time, in public, at a scala meetup at twitter which he showed up and gave a completely unscheduled talk at. "no external libraries" hints at that, but doesn't quite get at the source of the issue.

One of the nice things about pants being written in python is that people are able to come into it and start hacking away immediately, because it's a well-known programming language with lots of documentation. We immediately lose that if the only alternative is starlark rules. I'm not intending to express vitriol here, but I personally very strongly don't like the idea of writing code in a fake language controlled by the company whose open source policies define the phrase "thrown over the wall".

These are concerns about the bazel rule API, unrelated to using starlark as a parser.

So to mitigate that, I'm looking into the difficulty of implementing starlark with v2 rules.

I certainly think this is possible, and wouldn't be crazily hard (the models are actually pretty similar, at their core), and could be an interesting experiment, but I'd probably hold off at least until we work out #4535 because it's going to pretty radically shape how target types are registered, which is basically what the bazel rules API does. That said, please don't feel the need to do this specifically because of my experiment here - I'm not pushing for adopting the bazel rules API. But I'd be interested to see what it may yield!

@cosmicexplorer
Copy link
Contributor

cosmicexplorer commented Jan 16, 2019

It seems that I ran much too far with my interpretation, and yet again I deeply appreciate the thorough response despite that. I would love to say first off:

The import detection is both slow, and easy to circumvent - it adds a noticeable amount of time to BUILD file parsing...

I'm totally fine with and in fact would love to restrict what's possible in BUILD files to what Starlark provides, which it seems is the point of this PR. I would love to help achieve this. This is an understanding I did not have before your very thoughtful and patient comment above.

I'm going to respond to some other points as well, but if they are off track we can move them elsewhere. Since Starlark is used for rules and for BUILD files, some of these may be applicable.

I expect anyone not working with pants daily will use requests to fetch things from the internet instead of our API, because it's the obvious and easy thing to do

I don't know that I agree, but one alternative is to then mark "untrusted" rules as uncacheable, possibly with a separate rule type, and allow opt-in usage of cacheability, parallelism, etc. We would still have caching and parallelism at nodes written in pants or by "experienced" rule writers (this emergent paradigm can be considered similar to a genrule from my understanding). This is absolutely doable if users find the documentation too dense, or if they're not quite sure how to implement something "the right way" right now. This is the kind of opt-in I'm thinking of.

I still don't know that I share your quoted expectation though, since the @rule interface provides dependency injection as well as (separately) type safety. If we provide lots of not just examples, but also wrappers for things people might want to e.g. fetch from the internet (e.g. compilers, git repos, scripts), so that downloading something is as easy as importing it, I do not believe it is too difficult for users to dive into yet another dependency injection framework.

The point of sandboxing is to guide people

The point of sandboxing is the opposite of guiding, as I understand it.

And I think that will be true no matter how much documentation we do of our API.

If it's going to be difficult for users to understand how to use the caching, parallel v2 model from reading docs, and that difficulty presumably leads to weird errors they can't understand that block them, does that not imply an equivalent (greater?) difficulty familiarizing themselves with a more sandboxed model? Am I asking the right question?

A couple more thoughts looking forward:

  • I'm actually looking into a correspondence between Skyframe and the v2 engine now after some further investigation (rather than directly consuming Starlark), which would require threading through some disparate parts of the Bazel codebase, which isn't too hard to jump through with the right tools.
  • Restarts during Skyframe function evaluation are surprising and costly bazelbuild/bazel#5294 is an example of where the current implementation of Bazel rules can lead to pathological behavior despite the sandbox, which sinks performance and requires documentation. I thought it was interesting that a coroutine-based execution model (@rules as coroutines #5580 creeping up again) was proposed as the way Bazel could address the arbitrary cost of "label resolution" (a process I'm currently unfamiliar with).

@stuhood
Copy link
Member

stuhood commented Jan 16, 2019

This is a lot to catch up on, and I hope to eventually. But wanted to chime in quickly to say that I support using the rust starlark implementation to invoke v2 @rules (although the priority of that is unknown).

As previously mentioned, I also support doing BUILD file parsing in starlark, and think that that would likely have higher priority than porting all @rules.

@cosmicexplorer
Copy link
Contributor

I made #7093 as a more in depth look at the idea of using the v2 engine to implement skyframe!

@hlopko
Copy link

hlopko commented Jan 18, 2019

Howdy, this is super cool! Have you seen https://github.com/google/starlark-rust I haven't used it personally but it's a hobby project of a former Bazel TL so it shouldn't be completely bad :)

@illicitonion
Copy link
Contributor Author

Howdy, this is super cool! Have you seen https://github.com/google/starlark-rust I haven't used it personally but it's a hobby project of a former Bazel TL so it shouldn't be completely bad :)

It indeed uses that project :) (to which I am now a committer; thanks Damien!)

The naive way starlark is currently parsing BUILD files relies on
constructed objects not needing functions to be called on them (or
properties to be gotten from them).
Starlark (rightly) doesn't support implicit string concatenation.
This is super naive, and performs accordingly. What it does:
1. Register functions for all symbols exposed to BUILD files which
   record that they were called (and their arguments).
2. Passes a big python list of these calls up.
3. Executes each of those calls (potentially multiple times), and
   records which target-like functions were called.

This successfully parses all of pants's own BUILD files, with a few
exceptions:
1. This naive way of performing parsing means that any functions which
   are called don't return real objects, and so return values can't be
   operated on. Exactly one BUILD file in the repo called a function on
   the result of a function, so I hacked around it for now.
2. Starlark doesn't support implicit string concatenation. Exactly one
   BUILD file in pants uses implicit string concatenation, so I switched
   it to explicit. I strongly support banning implicit string concatenation,
   as it has the scope for a lot of mistakes.

Performance is currently about half of the existing parser. I haven't
profiled, but my assumption is that this is mostly because we've doubled
the object construction overhead (object construction in Python is
mostly dict-creation, and we're now creating an intermediate dict of
things to hydrate, and then using those to create the actual objects).

If we wanted to take this and turn it into something real, the first
change I'd probably make would be to delay both layers of python object
construction until they're really needed:
1. Reuse a shared parent Environment with functions pre-loaded, rather
   than making fresh ones from scratch every time.
2. Don't pass all of the starlark internal state over to Python (which
   acquires the GIL per object), instead store it in-memory in Rust, and
   only pass things over to Python when they're needed. This way, things
   like `list ::` could mostly avoid the GIL, just passing the strings
   of names up to Python, and things like
   `list --changed-parent=master ::` could just pass up the strings of
   names, and the Snapshots, rather than more complex objects.
3. Add support for datatype-like structs to starlark, so that the
   non-Target objects we create can be natively created in starlark,
   rather than mirrored as pseudo-functions on the Python side.
4. Tweak the Starlark API so that we don't need to move values around
   quite as much. Currently a function which returns a String goes:
   String -> starlark Value -> parsing Value -> ffi String -> Python String
   most of those involve moves, and surprisingly many involve copies.
   These should mostly be avoidable with some API tweaks.
5. Tweak the Starlark API so that object type checks can simply be
   dynamic downcasts, rather than require a string comparisons with
   Value.get_type.
6. Start writing rules as Starlark instead of Python, so they can just
   be natively interpreted by the Starlark evaluator.
@illicitonion illicitonion force-pushed the dwagnerhall/starlarkparsing/all branch from 039de70 to 0bbe449 Compare January 29, 2019 09:06
@Eric-Arellano
Copy link
Contributor

I believe we decided to not go with this experiment, so closing.

Neat PR, though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants