Skip to content

Commit

Permalink
Basic starlark parser for BUILD files
Browse files Browse the repository at this point in the history
This is super naive, and performs accordingly. What it does:
1. Register functions for all symbols exposed to BUILD files which
   record that they were called (and their arguments).
2. Passes a big python list of these calls up.
3. Executes each of those calls (potentially multiple times), and
   records which target-like functions were called.

This successfully parses all of pants's own BUILD files, with a few
exceptions:
1. Starlark currently doesn't support set literals. This should be easy
   to add behind a feature.
2. This naive way of performing parsing means that any functions which
   are called don't return real objects, and so return values can't be
   operated on. Exactly one BUILD file in the repo called a function on
   the result of a function, so I hacked around it for now.
3. Starlark doesn't support implicit string concatenation. Exactly one
   BUILD file in pants uses implicit string concatenation, so I switched
   it to explicit. I strongly support banning implicit string concatenation,
   as it has the scope for a lot of mistakes.

Performance is currently about half of the existing parser. I haven't
profiled, but my assumption is that this is mostly because we've doubled
the object construction overhead (object construction in Python is
mostly dict-creation, and we're now creating an intermediate dict of
things to hydrate, and then using those to create the actual objects).

If we wanted to take this and turn it into something real, the first
change I'd probably make would be to delay both layers of python object
construction until they're really needed:
1. Reuse a shared parent Environment with functions pre-loaded, rather
   than making fresh ones from scratch every time.
2. Don't pass all of the starlark internal state over to Python (which
   acquires the GIL per object), instead store it in-memory in Rust, and
   only pass things over to Python when they're needed. This way, things
   like `list ::` could mostly avoid the GIL, just passing the strings
   of names up to Python, and things like
   `list --changed-parent=master ::` could just pass up the strings of
   names, and the Snapshots, rather than more complex objects.
3. Add support for datatype-like structs to starlark, so that the
   non-Target objects we create can be natively created in starlark,
   rather than mirrored as pseudo-functions on the Python side.
4. Tweak the Starlark API so that we don't need to move values around
   quite as much. Currently a function which returns a String goes:
   String -> starlark Value -> parsing Value -> ffi String -> Python String
   most of those involve moves, and surprisingly many involve copies.
   These should mostly be avoidable with some API tweaks.
5. Tweak the Starlark API so that object type checks can simply be
   dynamic downcasts, rather than require a string comparisons with
   Value.get_type.
6. Start writing rules as Starlark instead of Python, so they can just
   be natively interpreted by the Starlark evaluator.
  • Loading branch information
illicitonion committed Dec 28, 2018
1 parent ebe652f commit 039de70
Show file tree
Hide file tree
Showing 16 changed files with 831 additions and 5 deletions.
5 changes: 4 additions & 1 deletion src/python/pants/engine/build_files.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@
from pants.build_graph.address_lookup_error import AddressLookupError
from pants.engine.addressable import AddressableDescriptor, BuildFileAddresses
from pants.engine.fs import Digest, FilesContent, PathGlobs, Snapshot
from pants.engine.legacy.starlark_parser import ParseOutput, ParseInput
from pants.engine.mapper import AddressFamily, AddressMap, AddressMapper, ResolveError
from pants.engine.objects import Locatable, SerializableFactory, Validatable
from pants.engine.parser import TargetAdaptorContainer
Expand Down Expand Up @@ -56,9 +57,11 @@ def parse_address_family(address_mapper, directory):
raise ResolveError('Directory "{}" does not contain any BUILD files.'.format(directory.path))
address_maps = []
for filecontent_product in files_content:
parsed_objects = yield Get(ParseOutput, ParseInput, ParseInput(filecontent_product, tuple(sorted(address_mapper.parser._symbols.keys()))))
address_maps.append(AddressMap.parse(filecontent_product.path,
filecontent_product.content,
address_mapper.parser))
address_mapper.parser,
parsed_objects=parsed_objects))
yield AddressFamily.create(directory.path, address_maps)


Expand Down
1 change: 1 addition & 0 deletions src/python/pants/engine/fs.py
Original file line number Diff line number Diff line change
Expand Up @@ -158,5 +158,6 @@ def create_fs_rules():
RootRule(Digest),
RootRule(MergedDirectories),
RootRule(PathGlobs),
RootRule(FileContent),
RootRule(UrlToFetch),
]
2 changes: 1 addition & 1 deletion src/python/pants/engine/legacy/BUILD
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ python_library(

python_library(
name='parser',
sources=['parser.py'],
sources=['parser.py', 'starlark_parser.py'],
dependencies=[
'3rdparty/python:future',
':structs',
Expand Down
102 changes: 102 additions & 0 deletions src/python/pants/engine/legacy/starlark_parser.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# coding=utf-8
# Copyright 2015 Pants project contributors (see CONTRIBUTORS.md).
# Licensed under the Apache License, Version 2.0 (see LICENSE).

from __future__ import absolute_import, division, print_function, unicode_literals

import logging
import os

from future.utils import text_type

from pants.engine.fs import FileContent
from pants.engine.legacy.parser import AbstractLegacyPythonCallbacksParser
from pants.util.objects import datatype, Collection

logger = logging.getLogger(__name__)


class Call(datatype([
("function_name", text_type),
("args", tuple),
("kwargs", tuple),
])):
"""Represents a call to a python function mirrored into starlark."""


class CallIndex(datatype([("i", int)])):
"""A pointer into a list of Calls."""


ParseOutput = Collection.of(Call)


class ParseInput(datatype([
("file_content", FileContent),
("function_names", tuple)
])):
"""The input to a starlark parse."""


class ParseFunction(datatype([
("function_name", text_type),
])):
"""A reference to a function (the index into the symbols dict)."""


class StarlarkParser(AbstractLegacyPythonCallbacksParser):
"""A parser that parses the given python code into a list of top-level via s starlark interpreter..
Only Serializable objects with `name`s will be collected and returned. These objects will be
addressable via their name in the parsed namespace.
This parser attempts to be compatible with existing legacy BUILD files and concepts including
macros and target factories.
"""

def __init__(self, symbol_table, aliases, build_file_imports_behavior):
"""
:param symbol_table: A SymbolTable for this parser, which will be overlaid with the given
additional aliases.
:type symbol_table: :class:`pants.engine.parser.SymbolTable`
:param aliases: Additional BuildFileAliases to register.
:type aliases: :class:`pants.build_graph.build_file_aliases.BuildFileAliases`
:param build_file_imports_behavior: How to behave if a BUILD file being parsed tries to use
import statements. Valid values: "allow", "warn", "error". Must be "error".
:type build_file_imports_behavior: string
"""
super(StarlarkParser, self).__init__(symbol_table, aliases)
if build_file_imports_behavior != "error":
raise ValueError(
"Starlark parse doesn't support imports; --build-file-imports must be error but was {}".format(
build_file_imports_behavior
)
)


def parse(self, filepath, filecontent, parsed_objects):
# Mutate the parse context for the new path, then exec, and copy the resulting objects.
# We execute with a (shallow) clone of the symbols as a defense against accidental
# pollution of the namespace via imports or variable definitions. Defending against
# _intentional_ mutation would require a deep clone, which doesn't seem worth the cost at
# this juncture.
self._parse_context._storage.clear(os.path.dirname(filepath))
for obj in parsed_objects:
self.evaluate(obj, parsed_objects, self._symbols)
return list(self._parse_context._storage.objects)


def evaluate(self, v, parsed_objects, symbols):
if isinstance(v, Call):
kwargs = ({k: self.evaluate(v, parsed_objects, symbols) for k, v in v.kwargs})
args = [self.evaluate(arg, parsed_objects, symbols) for arg in v.args]
func = symbols[v.function_name]
return func(*args, **kwargs)
elif isinstance(v, CallIndex):
return self.evaluate(parsed_objects.dependencies[v.i], parsed_objects, symbols)
elif isinstance(v, ParseFunction):
return symbols[v.function_name]
elif isinstance(v, tuple):
return [self.evaluate(item, parsed_objects, symbols) for item in v]
else:
return v
14 changes: 13 additions & 1 deletion src/python/pants/engine/native.py
Original file line number Diff line number Diff line change
Expand Up @@ -687,6 +687,10 @@ def new_scheduler(self,
construct_file,
construct_link,
construct_process_result,
construct_parse_call,
construct_parse_call_index,
construct_parse_output,
construct_parse_function,
constraint_address,
constraint_path_globs,
constraint_directory_digest,
Expand All @@ -699,7 +703,9 @@ def new_scheduler(self,
constraint_process_request,
constraint_process_result,
constraint_generator,
constraint_url_to_fetch):
constraint_url_to_fetch,
constraint_parse_input,
constraint_parse_output):
"""Create and return an ExternContext and native Scheduler."""

def func(constraint):
Expand All @@ -719,6 +725,10 @@ def tc(constraint):
func(construct_file),
func(construct_link),
func(construct_process_result),
func(construct_parse_call),
func(construct_parse_call_index),
func(construct_parse_output),
func(construct_parse_function),
# TypeConstraints.
tc(constraint_address),
tc(constraint_path_globs),
Expand All @@ -733,6 +743,8 @@ def tc(constraint):
tc(constraint_process_result),
tc(constraint_generator),
tc(constraint_url_to_fetch),
tc(constraint_parse_input),
tc(constraint_parse_output),
# Types.
TypeId(self.context.to_id(text_type)),
TypeId(self.context.to_id(binary_type)),
Expand Down
7 changes: 7 additions & 0 deletions src/python/pants/engine/scheduler.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
MergedDirectories, Path, PathGlobs, PathGlobsAndRoot, Snapshot,
UrlToFetch)
from pants.engine.isolated_process import ExecuteProcessRequest, FallibleExecuteProcessResult
from pants.engine.legacy.starlark_parser import Call, CallIndex, ParseOutput, ParseFunction, ParseInput
from pants.engine.native import Function, TypeConstraint, TypeId
from pants.engine.nodes import Return, Throw
from pants.engine.rules import RuleIndex, SingletonRule, TaskRule
Expand Down Expand Up @@ -105,6 +106,10 @@ def __init__(
construct_file=File,
construct_link=Link,
construct_process_result=FallibleExecuteProcessResult,
construct_parse_call=Call,
construct_parse_call_index=CallIndex,
construct_parse_output=ParseOutput,
construct_parse_function=ParseFunction,
constraint_address=constraint_for(Address),
constraint_path_globs=constraint_for(PathGlobs),
constraint_directory_digest=constraint_for(Digest),
Expand All @@ -118,6 +123,8 @@ def __init__(
constraint_process_result=constraint_for(FallibleExecuteProcessResult),
constraint_generator=constraint_for(GeneratorType),
constraint_url_to_fetch=constraint_for(UrlToFetch),
constraint_parse_input=constraint_for(ParseInput),
constraint_parse_output=constraint_for(ParseOutput),
)


Expand Down
4 changes: 2 additions & 2 deletions src/python/pants/init/engine_initializer.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
from pants.engine.legacy.graph import (LegacyBuildGraph, TransitiveHydratedTargets,
create_legacy_graph_tasks)
from pants.engine.legacy.options_parsing import create_options_parsing_rules
from pants.engine.legacy.parser import LegacyPythonCallbacksParser
from pants.engine.legacy.starlark_parser import StarlarkParser
from pants.engine.legacy.structs import (AppAdaptor, JvmBinaryAdaptor, PageAdaptor,
PantsPluginAdaptor, PythonBinaryAdaptor,
PythonTargetAdaptor, PythonTestsAdaptor,
Expand Down Expand Up @@ -330,7 +330,7 @@ def setup_legacy_graph_extended(
execution_options = execution_options or DEFAULT_EXECUTION_OPTIONS

# Register "literal" subjects required for these rules.
parser = LegacyPythonCallbacksParser(
parser = StarlarkParser(
symbol_table,
build_file_aliases,
build_file_imports_behavior
Expand Down
Loading

0 comments on commit 039de70

Please sign in to comment.