-
-
Notifications
You must be signed in to change notification settings - Fork 632
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Basic starlark parser for BUILD files
This is super naive, and performs accordingly. What it does: 1. Register functions for all symbols exposed to BUILD files which record that they were called (and their arguments). 2. Passes a big python list of these calls up. 3. Executes each of those calls (potentially multiple times), and records which target-like functions were called. This successfully parses all of pants's own BUILD files, with a few exceptions: 1. Starlark currently doesn't support set literals. This should be easy to add behind a feature. 2. This naive way of performing parsing means that any functions which are called don't return real objects, and so return values can't be operated on. Exactly one BUILD file in the repo called a function on the result of a function, so I hacked around it for now. 3. Starlark doesn't support implicit string concatenation. Exactly one BUILD file in pants uses implicit string concatenation, so I switched it to explicit. I strongly support banning implicit string concatenation, as it has the scope for a lot of mistakes. Performance is currently about half of the existing parser. I haven't profiled, but my assumption is that this is mostly because we've doubled the object construction overhead (object construction in Python is mostly dict-creation, and we're now creating an intermediate dict of things to hydrate, and then using those to create the actual objects). If we wanted to take this and turn it into something real, the first change I'd probably make would be to delay both layers of python object construction until they're really needed: 1. Reuse a shared parent Environment with functions pre-loaded, rather than making fresh ones from scratch every time. 2. Don't pass all of the starlark internal state over to Python (which acquires the GIL per object), instead store it in-memory in Rust, and only pass things over to Python when they're needed. This way, things like `list ::` could mostly avoid the GIL, just passing the strings of names up to Python, and things like `list --changed-parent=master ::` could just pass up the strings of names, and the Snapshots, rather than more complex objects. 3. Add support for datatype-like structs to starlark, so that the non-Target objects we create can be natively created in starlark, rather than mirrored as pseudo-functions on the Python side. 4. Tweak the Starlark API so that we don't need to move values around quite as much. Currently a function which returns a String goes: String -> starlark Value -> parsing Value -> ffi String -> Python String most of those involve moves, and surprisingly many involve copies. These should mostly be avoidable with some API tweaks. 5. Tweak the Starlark API so that object type checks can simply be dynamic downcasts, rather than require a string comparisons with Value.get_type. 6. Start writing rules as Starlark instead of Python, so they can just be natively interpreted by the Starlark evaluator.
- Loading branch information
1 parent
ebe652f
commit 039de70
Showing
16 changed files
with
831 additions
and
5 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
# coding=utf-8 | ||
# Copyright 2015 Pants project contributors (see CONTRIBUTORS.md). | ||
# Licensed under the Apache License, Version 2.0 (see LICENSE). | ||
|
||
from __future__ import absolute_import, division, print_function, unicode_literals | ||
|
||
import logging | ||
import os | ||
|
||
from future.utils import text_type | ||
|
||
from pants.engine.fs import FileContent | ||
from pants.engine.legacy.parser import AbstractLegacyPythonCallbacksParser | ||
from pants.util.objects import datatype, Collection | ||
|
||
logger = logging.getLogger(__name__) | ||
|
||
|
||
class Call(datatype([ | ||
("function_name", text_type), | ||
("args", tuple), | ||
("kwargs", tuple), | ||
])): | ||
"""Represents a call to a python function mirrored into starlark.""" | ||
|
||
|
||
class CallIndex(datatype([("i", int)])): | ||
"""A pointer into a list of Calls.""" | ||
|
||
|
||
ParseOutput = Collection.of(Call) | ||
|
||
|
||
class ParseInput(datatype([ | ||
("file_content", FileContent), | ||
("function_names", tuple) | ||
])): | ||
"""The input to a starlark parse.""" | ||
|
||
|
||
class ParseFunction(datatype([ | ||
("function_name", text_type), | ||
])): | ||
"""A reference to a function (the index into the symbols dict).""" | ||
|
||
|
||
class StarlarkParser(AbstractLegacyPythonCallbacksParser): | ||
"""A parser that parses the given python code into a list of top-level via s starlark interpreter.. | ||
Only Serializable objects with `name`s will be collected and returned. These objects will be | ||
addressable via their name in the parsed namespace. | ||
This parser attempts to be compatible with existing legacy BUILD files and concepts including | ||
macros and target factories. | ||
""" | ||
|
||
def __init__(self, symbol_table, aliases, build_file_imports_behavior): | ||
""" | ||
:param symbol_table: A SymbolTable for this parser, which will be overlaid with the given | ||
additional aliases. | ||
:type symbol_table: :class:`pants.engine.parser.SymbolTable` | ||
:param aliases: Additional BuildFileAliases to register. | ||
:type aliases: :class:`pants.build_graph.build_file_aliases.BuildFileAliases` | ||
:param build_file_imports_behavior: How to behave if a BUILD file being parsed tries to use | ||
import statements. Valid values: "allow", "warn", "error". Must be "error". | ||
:type build_file_imports_behavior: string | ||
""" | ||
super(StarlarkParser, self).__init__(symbol_table, aliases) | ||
if build_file_imports_behavior != "error": | ||
raise ValueError( | ||
"Starlark parse doesn't support imports; --build-file-imports must be error but was {}".format( | ||
build_file_imports_behavior | ||
) | ||
) | ||
|
||
|
||
def parse(self, filepath, filecontent, parsed_objects): | ||
# Mutate the parse context for the new path, then exec, and copy the resulting objects. | ||
# We execute with a (shallow) clone of the symbols as a defense against accidental | ||
# pollution of the namespace via imports or variable definitions. Defending against | ||
# _intentional_ mutation would require a deep clone, which doesn't seem worth the cost at | ||
# this juncture. | ||
self._parse_context._storage.clear(os.path.dirname(filepath)) | ||
for obj in parsed_objects: | ||
self.evaluate(obj, parsed_objects, self._symbols) | ||
return list(self._parse_context._storage.objects) | ||
|
||
|
||
def evaluate(self, v, parsed_objects, symbols): | ||
if isinstance(v, Call): | ||
kwargs = ({k: self.evaluate(v, parsed_objects, symbols) for k, v in v.kwargs}) | ||
args = [self.evaluate(arg, parsed_objects, symbols) for arg in v.args] | ||
func = symbols[v.function_name] | ||
return func(*args, **kwargs) | ||
elif isinstance(v, CallIndex): | ||
return self.evaluate(parsed_objects.dependencies[v.i], parsed_objects, symbols) | ||
elif isinstance(v, ParseFunction): | ||
return symbols[v.function_name] | ||
elif isinstance(v, tuple): | ||
return [self.evaluate(item, parsed_objects, symbols) for item in v] | ||
else: | ||
return v |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.