Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow tags to be applied to targets from a single source #7302

Closed
codealchemy opened this issue Mar 2, 2019 · 12 comments · Fixed by #7315
Closed

Allow tags to be applied to targets from a single source #7302

codealchemy opened this issue Mar 2, 2019 · 12 comments · Fixed by #7315
Assignees

Comments

@codealchemy
Copy link
Contributor

codealchemy commented Mar 2, 2019

Now that task targets can be filtered by tags (prior work from #6902), having a way of defining a single source for which tags to apply to which targets would be helpful to avoid needing to set the same tag across multiple BUILD files.

There are other use cases as well, such as grouping all targets with a particular suffix (or prefix) under a single tag (as noted for integration targets in pants being tagged with integration below), or others (fsqio, for instance, uses tags for rule enforcement on the build graph)

@codealchemy
Copy link
Contributor Author

codealchemy commented Mar 2, 2019

Proposed Solution

Using #6902 (comment) as a starting point, a (global) subsystem would fit well for this purpose - taking a single option (the path to the source file) and providing a helper function for identifying which tags to apply to which targets. This would act as a supplement to any tags defined in the BUILD files themselves, and would not be coupled to tag filtering so that it could apply to a broader selection of use cases.

Plugging this into the LegacyPythonCallbacksParser would allow us to apply tags to targets during target initialization, though injecting an instance of a subsystem there has proven to be a challenge due to the problem of calling a global instance of a subsystem before the Scheduler has been created (i.e. it fails on an UninitializedSubsystemError), which led me to the following TODO in global options:

TODO: These options should move to a Subsystem once we add support for "bootstrap" Subsystems (ie,
allowing Subsystems to be consumed before the Scheduler has been created).

Not sure if there's a related issue for that, or if that may be resolved through anything currently in flight.

Which leads me here - looking for feedback on the problem / approach defined ☝️- in the meantime, I'm looking into other ways of (lazily) initializing the subsystem. There may be alternative approaches that would make sense for this problem as well (from what I can tell, possibly exposing a global option for the source file / path that could be fed to a service object).

For reference, WIP branch for the general approach I've taken so far here

@stuhood
Copy link
Member

stuhood commented Mar 2, 2019

Is it the case that these options need to be applied "across" scopes (ie, to multiple Tasks at once)? The example that you referenced here: #6902 (comment) ... looks like the configuration is still "per Task" or per scope.

If this is per scope, rather than using a global subsystem, it feels like you should be able to use the Subsystem that you added in #7275 to add regex support. That would invert the model from the linked comment... so the config (as it would be specified in pants.ini) would look more like:

[target-filter]
target_restrictions: {
    'blacklist': [<target regexes>], # Targets ignored by all tasks that support this subsystem
    ..,
  }

[target-filter.lint.scalastyle]
target_restrictions: {
    'blacklist': [<target regexes>], # Targets ignored by just the scalastyle task
    'whitelist': # etc
  }

@benjyw
Copy link
Contributor

benjyw commented Mar 2, 2019

@stuhood I think this question is orthogonal to the specific use-case of #7275 .

That change's subsystem provides the answer to "which target tags should cause me to skip the target". Whereas this change provides a way of tagging targets in a centralized way, without having to laboriously edit every relevant BUILD file.

True, #7275 is the motivating use-case, but there could easily be others. For example, a single mapping that says "all targets whose name ends with _integration should have the tag integration" could replace all our tags = {'integration'} boilerplate.

@stuhood
Copy link
Member

stuhood commented Mar 2, 2019

@benjyw : Ok, I think I see what you're getting at. Expanding the description with more of what we're trying to accomplish would be helpful then I think, because the goal was not clear to me. It sounds like it's approximately: "there is a global subsystem that can add tags to targets based on regexes (?)".

Describing one or two usecases for this would be good too, because the _integration usecase can be satisfied in multiple ways.

@benjyw
Copy link
Contributor

benjyw commented Mar 2, 2019

True, specific use-cases can be satisfied in multiple ways. But I think it's better if we don't have to solve each case ad-hoc.

The general design here is: there can be K ways of adding tags to a target (e.g., inline, or in a central .json file, or ), and M uses of target tags (e.g., blacklisting unlintable targets, not running integration tests, or ).

Then we do K+M work instead of K*M work, yet get the added benefit of uniformity.

Kubernetes does something similar. You can apply arbitrary key-value labels to pods, and then select on those labels for various actions. https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/

@benjyw
Copy link
Contributor

benjyw commented Mar 2, 2019

@codealchemy can you edit your posts on this issue to clarify?

@benjyw
Copy link
Contributor

benjyw commented Mar 2, 2019

But back to the technical question: We'd like to have a subsystem that provides a .json file mapping tags to target addresses (or target address regexes). So we're trying to figure out how to get that into LegacyPythonCallbacksParser's ctor, because that's created before we have full options.

I'm hoping that we can just create it lazily, since it's not needed in the bootstrap sequence itself.

@stuhood
Copy link
Member

stuhood commented Mar 3, 2019

So, including these in the parser is one approach. I think that if you make lookup of the Subsystem instance lazy, that would work.

A few approaches

  1. Make lookup of the global subsystem lazy (as you mentioned), and then inject at parse time.
  2. Inject tags one level up in the @rules that hydrate "structs": https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/build_files.py#L138-L187 .. as I said, the naming is terrible. But: a struct "always" represents the un-instantiated placeholder for a target, and so you could include additional kwargs there.
  3. Inject tags when we're actually instantiating v1 Target objects here: https://github.com/pantsbuild/pants/blob/master/src/python/pants/engine/legacy/graph.py#L147-L172

Of these options, the first and second ones are cleanest, and more forwards compatible (they work across both v1 and v2). The third option (injecting in BuildGraph) will only work for v1, because v2 does not create a BuildGraph.

Unfortunately, it looks like there are some minor blockers to taking the second approach (related to #6880: we don't actually inject the OptionsBootstrapper in the codepath that hydrates targets for v1). So I think either one or three would be paths forward for now (possibly with a TODO mentioning consuming the tags via optionable_rule).

@codealchemy
Copy link
Contributor Author

@stuhood it looks like making the lookup lazy runs into the same issue in option 1 and 3 (seen running ./pants2 test tests/python/pants_test/build_graph:build_graph):

Ex. test error from option 1
self = <pants_test.build_graph.test_build_graph.BuildGraphTest testMethod=test_transitive_subgraph_of_addresses_bfs_predicate>

    def test_transitive_subgraph_of_addresses_bfs_predicate(self):
      root = self.inject_graph('a', {
        'a': ['b', 'c'],
        'b': ['d', 'e'],
>       'c': [], 'd': [], 'e': [],
      })

.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants_test/build_graph/test_build_graph.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants_test/build_graph/test_build_graph.py:44: in inject_graph
    self.build_graph.inject_address_closure(root_address)
.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/legacy/graph.py:206: in inject_address_closure
    self.inject_addresses_closure([address])
.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/legacy/graph.py:214: in inject_addresses_closure
    for _ in self._inject_specs(specs):
.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/legacy/graph.py:271: in _inject_specs
    subjects)
/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/contextlib.py:35: in __exit__
    self.gen.throw(type, value, traceback)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pants.engine.legacy.graph.LegacyBuildGraph object at 0x1042aba90>

    @contextmanager
    def _resolve_context(self):
      try:
        yield
      except Exception as e:
        raise AddressLookupError(
>         'Build graph construction failed: {} {}'.format(type(e).__name__, str(e))
        )
E       AddressLookupError: Build graph construction failed: ExecutionError 1 Exception encountered:
E       Computing Select(Specs(dependencies<Exactly(tuple)>=(SingleAddress(directory=u'a', name=u'a'),), matcher<Exactly(SpecsMatcher)>=SpecsMatcher(tags<Exactly(tuple)>=(), exclude_patterns<Exactly(tuple)>=())), Exactly(TransitiveHydratedTargets))
E         Computing Task(transitive_hydrated_targets, Specs(dependencies<Exactly(tuple)>=(SingleAddress(directory=u'a', name=u'a'),), matcher<Exactly(SpecsMatcher)>=SpecsMatcher(tags<Exactly(tuple)>=(), exclude_patterns<Exactly(tuple)>=())), Exactly(TransitiveHydratedTargets), true)
E           Computing Task(addresses_from_address_families, Specs(dependencies<Exactly(tuple)>=(SingleAddress(directory=u'a', name=u'a'),), matcher<Exactly(SpecsMatcher)>=SpecsMatcher(tags<Exactly(tuple)>=(), exclude_patterns<Exactly(tuple)>=())), Exactly(BuildFileAddresses), true)
E             Computing Task(parse_address_family, Dir(path=a), Exactly(AddressFamily), true)
E               Throw(Failed to parse a/BUILD:
E       Subsystem "TargetTagDefinitions" not initialized for scope "target-tag-definitions". Is subsystem missing from subsystem_dependencies() in a task? )
E                 Traceback (most recent call last):
E                   File "/Users/schmitt/Workspace/pants/.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/native.py", line 422, in extern_generator_send
E                     res = c.from_value(func[0]).send(c.from_value(arg[0]))
E                   File "/Users/schmitt/Workspace/pants/.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/build_files.py", line 61, in parse_address_family
E                     address_mapper.parser))
E                   File "/Users/schmitt/Workspace/pants/.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/mapper.py", line 55, in parse
E                     raise MappingError('Failed to parse {}:\n{}'.format(filepath, e))
E                 MappingError: Failed to parse a/BUILD:
E                 Subsystem "TargetTagDefinitions" not initialized for scope "target-tag-definitions". Is subsystem missing from subsystem_dependencies() in a task?

.pants.d/pyprep/sources/3842a10025edd42c67b413781e754c8e6620a39d/pants/engine/legacy/graph.py:238: AddressLookupError
Ex. test error from option 3
self = <pants_test.build_graph.test_build_graph.BuildGraphTest testMethod=test_transitive_subgraph_of_addresses_bfs_predicate>

    def test_transitive_subgraph_of_addresses_bfs_predicate(self):
      root = self.inject_graph('a', {
        'a': ['b', 'c'],
        'b': ['d', 'e'],
>       'c': [], 'd': [], 'e': [],
      })

.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants_test/build_graph/test_build_graph.py:257:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants_test/build_graph/test_build_graph.py:44: in inject_graph
    self.build_graph.inject_address_closure(root_address)
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:211: in inject_address_closure
    self.inject_addresses_closure([address])
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:219: in inject_addresses_closure
    for _ in self._inject_specs(specs):
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:278: in _inject_specs
    self._index(thts.closure)
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:100: in _index
    new_targets.append(self._index_target(target_adaptor))
.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:133: in _index_target
    target = self._instantiate_target(target_adaptor)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <pants.engine.legacy.graph.LegacyBuildGraph object at 0x112407b10>
target_adaptor = TargetAdaptor(address=a)

    def _instantiate_target(self, target_adaptor):
      """Given a TargetAdaptor struct previously parsed from a BUILD file, instantiate a Target.

        TODO: This assumes that the SymbolTable used for parsing matches the SymbolTable passed
        to this graph. Would be good to make that more explicit, but it might be better to nuke
        the Target subclassing pattern instead, and lean further into the "configuration composition"
        model explored in the `exp` package.
        """
      target_cls = self._target_types[target_adaptor.type_alias]
      try:
        # Pop dependencies, which were already consumed during construction.
        kwargs = target_adaptor.kwargs()
        kwargs.pop('dependencies')

        # Append any globally defined tags to the target
        kwargs.setdefault('tags', set())
        kwargs['tags'].update(self._target_tag_definitions().tags_for(target_adaptor.address.spec))

        # Instantiate.
        if issubclass(target_cls, AppBase):
          return self._instantiate_app(target_cls, kwargs)
        elif target_cls is RemoteSources:
          return self._instantiate_remote_sources(kwargs)
        return target_cls(build_graph=self, **kwargs)
      except TargetDefinitionException:
        raise
      except Exception as e:
        raise TargetDefinitionException(
            target_adaptor.address,
>           'Failed to instantiate Target with type {}: {}'.format(target_cls, e))
E       TargetDefinitionException: Invalid target BuildFileAddress(a/BUILD, a): Failed to instantiate Target with type <class 'pants.build_graph.target.Target'>: Subsystem "TargetTagDefinitions" not initialized for scope "target-tag-definitions". Is subsystem missing from subsystem_dependencies() in a task?

.pants.d/pyprep/sources/405972c62321192a42f786a306a3b8089243e7b4/pants/engine/legacy/graph.py:177: TargetDefinitionException

For reference, same working branch (codealchemy@3f7c108 and codealchemy@495b2a5, respectively)

IIUC it looks like the build graph is parsed there before we have subsystems as well, is that right?

@stuhood
Copy link
Member

stuhood commented Mar 5, 2019

@codealchemy : It occurs to me that there is an existing subsystem that you can either:

  1. model after
  2. just use!

class Arguments(Subsystem):
"""Options relating to handling target arguments."""
class UnknownArgumentError(TargetDefinitionException):
"""An unknown keyword argument was supplied to Target."""
options_scope = 'target-arguments'
@classmethod
def register_options(cls, register):
register('--ignored', advanced=True, type=dict,
help='Map of target name to a list of keyword arguments that should be ignored if a '
'target receives them unexpectedly. Typically used to allow usage of arguments '
'in BUILD files that are not yet available in the current version of pants.')

@stuhood
Copy link
Member

stuhood commented Mar 5, 2019

But actually, it's possible that that subsystem is only consumed within BuildGraph parsing... I think that to guarantee that a Subsystem is loaded, as early as possible, you'd need to include it here:

# Gather the optionables that are not scoped to any other. All known scopes are reachable
# via these optionables' known_scope_infos() methods.
top_level_optionables = (
{GlobalOptionsRegistrar} |
GlobalSubsystems.get() |
build_configuration.optionables() |
set(Goal.get_optionables())
)

@benjyw
Copy link
Contributor

benjyw commented Mar 5, 2019

The new subsystem was added to GlobalSubsystems but still wasn't available in the parser. Are we sure Arguments actually works...? :)

benjyw pushed a commit that referenced this issue Mar 7, 2019
Problem:
It can be cumbersome to apply the same tags to many targets as it requires editing every associated BUILD file. Having a single source of tags would make it more straightforward to configure tags by which to filter targets (#6902), and can be applied to other use cases such as grouping targets with a particular suffix (ex. _integration in pants) under a single tag.

Solution:
Add a subsystem within Target responsible for fetching, parsing, and matching tags configured in a JSON file to targets. The subsystem matches against the target's address (though could be expanded to a regex to address additional use cases).

Closes #7302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants