Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arguments parameter to RunEnvironmentInfo #16430

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

fmeum
Copy link
Collaborator

@fmeum fmeum commented Oct 8, 2022

Executable Starlark rules can use the new arguments parameter on RunEnvironmentInfo to specify the arguments that Bazel should pass on the command line with test or run.

If set to a non-None value, this parameter overrides the value of the args attribute that is implicitly defined for all rules. This allows Starlark rules to implement their own version of this attribute which isn't bound to its proprietary processing (data label expansion and tokenization).

Along the way, this commit adds test coverage and documentation for the interplay between RunEnvironmentInfo's environment and --test_env.

The value of the arguments field of RunEnvironmentInfo is intentionally not exposed to Starlark yet: It is not clear how these arguments should be represented and whether rules relying on the magic args attribute should also provide this field.

RELNOTES: Executable starlark rules can use the arguments parameter of RunEnvironmentInfo to specify their command-line arguments with bazel run and bazel test.

Fixes #16076
Work towards #12313

@fmeum fmeum force-pushed the 16076-arguments branch 2 times, most recently from d747d2e to f8bd861 Compare October 8, 2022 12:32
Executable Starlark rules can use the new `arguments` parameter on
`RunEnvironmentInfo` to specify the arguments that Bazel should pass
on the command line with `test` or `run`.

If set to a non-`None` value, this parameter overrides the value of
the `args` attribute that is implicitly defined for all rules. This
allows Starlark rules to implement their own version of this attribute
which isn't bound to its proprietary processing (data label expansion
and tokenization).

Along the way, this commit adds test coverage and documentation for the
interplay between `RunEnvironmentInfo`'s `environment` and `--test_env`.

The value of the `arguments` field of `RunEnvironmentInfo` is
intentionally not exposed to Starlark yet: It is not clear how these
arguments should be represented and whether rules relying on the magic
`args` attribute should also provide this field.

RELNOTES: Executable starlark rules can use the `arguments` parameter of
`RunEnvironmentInfo` to specify their command-line arguments with `bazel
run` and `bazel test`.
@fmeum fmeum marked this pull request as ready for review October 8, 2022 13:08
@fmeum fmeum requested a review from lberki as a code owner October 8, 2022 13:08
@fmeum
Copy link
Collaborator Author

fmeum commented Oct 8, 2022

@comius Would you be available to review this PR?

@fmeum fmeum changed the title WIP: Add arguments parameter to RunEnvironmentInfo Add arguments parameter to RunEnvironmentInfo Oct 8, 2022
@sgowroji sgowroji added team-Build-Language awaiting-review PR is awaiting review from an assigned reviewer labels Oct 10, 2022
@lberki lberki requested review from comius and removed request for lberki October 10, 2022 11:24
@lberki
Copy link
Contributor

lberki commented Oct 10, 2022

Adding @comius since he reviewed #15232 .

@comius : do I understand correctly that RunEnvironmentInfo is intended to be only consumed within the current rule?

Then it's kind of odd that it's added to the set of providers the rule makes accessible to rules that depend on it; it seems unnecessary and it's Yet Another Little Thing on the interface of Bazel. I suppose we can't remove it anymore without an incompatible flag? :(

@fmeum
Copy link
Collaborator Author

fmeum commented Oct 10, 2022

@lberki I should add that advertising RunEnvironmentInfo was a conscious choice back then: #15224, #15225, and my own experience working with wrapper rules that apply transitions lead me to the opinion that Starlark-constructible providers should also be Starlark-retrievable. Otherwise, they can't be tested properly and essentially make any rules that use them "terminal" as wrappers can't read them.

I do agree that this extends the interface, but maybe that extension isn't unnecessary :-)

@lberki
Copy link
Contributor

lberki commented Oct 10, 2022

If it was a conscious choice (as opposed to just a side effect of whatever the most convenient choice was at that time), all the power to y'all! I didn't do a lot of archeology, just noticed this odd provider and thought I'd flag it.

@UebelAndre
Copy link
Contributor

@comius sorry for the ping. Would you happen to have bandwidth this week to review this PR?

@comius
Copy link
Contributor

comius commented Nov 2, 2022

This allows Starlark rules to implement their own version of this [args] attribute which isn't bound to its proprietary processing (data label expansion and tokenization).

I don't think this is the direction we should be following. Adding a way to customise args attribute will result in "proprietary processing" invented by each rule author for their rules - or copy-pasting the code around that does it.

I'd rather pursue the direction where the args attribute is parsed in a general / straight forward non-proprietary way.

Can you close this PR and rather introduce a flag and optional allow list, that fixes the args parsing? Unless I'm missing something a list of strings should be straightforward to convert to a list of args.

Are there other arguments for customization? Otherwise, I really don't want users to figure out on which rules args work and on which rules they don't.

@UebelAndre
Copy link
Contributor

Can you close this PR and rather introduce a flag and optional allow list, that fixes the args parsing?

Can you expand on what you mean by "flag and optional allow list"?

Are there other arguments for customization? Otherwise, I really don't want users to figure out on which rules args work and on which rules they don't.

My use case for this functionality is to be able to write tests rules with unique attributes that are translated to arguments to a test binary. args is intentionally not exposed in these test rules as it invites users to do things that break the determinism of the targets and introduces cache poisoning. This feature would be very valuable in helping me cleanup code specifically for translating attributes into arguments for a test binary, which is unreasonably difficult to do today.

@comius
Copy link
Contributor

comius commented Nov 2, 2022

Can you close this PR and rather introduce a flag and optional allow list, that fixes the args parsing?
Can you expand on what you mean by "flag and optional allow list"?

Flag like --incompatible_sane_rule_args_parsing, that would stop doing the legacy things described in #12313. That is if args = ["a", "b c", "e \"f"] the program or test should be passed exactly three arguments with exactly the content the strings there have.

Optional allow list, something in the spirit of 67d3bca
Google would need an allowlist to do a cleanup, because we simply cannot fix the whole repository in one commit.

Are there other arguments for customization? Otherwise, I really don't want users to figure out on which rules args work and on which rules they don't.

My use case for this functionality is to be able to write tests rules with unique attributes that are translated to arguments to a test binary. args is intentionally not exposed in these test rules as it invites users to do things that break the determinism of the targets and introduces cache poisoning. This feature would be very valuable in helping me cleanup code specifically for translating attributes into arguments for a test binary, which is unreasonably difficult to do today.

Why can't this be done with a macro?

If I understand you correctly, you want to translate "unique attributes to arguments". Isn't that:

def my_test_macro(arg1 = 1, arg2 = "str", arg3 = [3,4,5], **kwargs):
  my_test(args = [str(arg1), arg2] + [str(a) for a in arg3], **kwargs)

?

@UebelAndre
Copy link
Contributor

Why can't this be done with a macro?

If I understand you correctly, you want to translate "unique attributes to arguments". Isn't that:

def my_test_macro(arg1 = 1, arg2 = "str", arg3 = [3,4,5], **kwargs):
  my_test(args = [str(arg1), arg2] + [str(a) for a in arg3], **kwargs)

Some arguments accept paths to files which some users would expect to be able to pass a label to. The macro would not work with select statements which is something I need on a number of my rules.

@comius
Copy link
Contributor

comius commented Nov 2, 2022

Some arguments accept paths to files which some users would expect to be able to pass a label to.
The macro would not work with select statements which is something I need on a number of my rules.

You mean something like this?

def my_test_macro(file, **kwargs):
  my_test(args = ["$(location %s)" % file], **kwargs)
my_test_macro(name = "a", file = select({"...": "//file1", "...": "//file2"})

@UebelAndre
Copy link
Contributor

Some arguments accept paths to files which some users would expect to be able to pass a label to.
The macro would not work with select statements which is something I need on a number of my rules.

You mean something like this?

def my_test_macro(file, **kwargs):
  my_test(args = ["$(location %s)" % file], **kwargs)
my_test_macro(name = "a", file = select({"...": "//file1", "...": "//file2"})

I don't think you can pass selects to expansions like that. Is there evidence of that working?

Additionally, there may be things only knowable within a rule (like whether or not coverage is enabled) that would also need to be translated into an arugment.

@UebelAndre
Copy link
Contributor

You mean something like this?

def my_test_macro(file, **kwargs):
  my_test(args = ["$(location %s)" % file], **kwargs)
my_test_macro(name = "a", file = select({"...": "//file1", "...": "//file2"})

This also doesn't seem like it'd support having lists of files expanded into --flag=$(file) arguments. Is that correct?

@UebelAndre
Copy link
Contributor

I also have cases where I have a custom set of foo_library, foo_binary, and foo_test targets where the tests take information from FooInfo providers in the library and binary targets and consume them in the tests. This would certainly not be doable via macros.

@comius
Copy link
Contributor

comius commented Nov 2, 2022

I don't think you can pass selects to expansions like that. Is there evidence of that working?

No, it's not working. What you can do if you need a case like this, is pass dict, and then wrap it into a select within a macro.

This also doesn't seem like it'd support having lists of files expanded into --flag=$(file) arguments. Is that correct?

You can implement expansion like that in a macro.

I also have cases where I have a custom set of foo_library, foo_binary, and foo_test targets where the tests take information from FooInfo providers in the library and binary targets and consume them in the tests. This would certainly not be doable via macros.

But it's completely doable with a second rule. You can pass foo_* to that rule as a label. Access the providers. And also execute it using ctx.actions.run. It seems to me the only difference is having 1 vs. 2 rules.

If the usability is problematic, you can wrap 2 rules together into a macro, to achieve the appearance of a single rule.

@UebelAndre
Copy link
Contributor

Something being “doable” I don’t think is reason enough to reject improvements. I’ve proven what I want is doable since have the functionality I want. But I continue to see other developers attempt to solve for the same gap in functionality in different ways which leads to very hard to maintain projects. I have other examples of this but feel sharing them won’t change the outcome of the PR so I respectfully express my disagreement with the decision to reject this PR and leave the rest to you and @fmeum

@brandjon brandjon added team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts and removed team-Build-Language labels Nov 5, 2022
@cameron-martin
Copy link
Contributor

@comius Doesn't the argument of it being achievable via a wrapper macro also apply to environment variables? In this case why was #15232 merged but not this?

@comius
Copy link
Contributor

comius commented Sep 6, 2023

I disagree this is a gap in functionality. Args tokenization is a tech-debt loaded code. This PR seems to provide users to implement workarounds over it. The solution should be a more principle one and accepting the PR could make that harder if not even impossible.

@comius comius added awaiting-user-response Awaiting a response from the author and removed awaiting-review PR is awaiting review from an assigned reviewer labels Sep 6, 2023
@cameron-martin
Copy link
Contributor

I don't think the main purpose of this PR is to fix arguments tokenisation. At least the reasons I want it are not for this, but rather mirror the use cases that @UebelAndre provided.

@meteorcloudy meteorcloudy requested a review from a team as a code owner November 13, 2023 10:19
@meteorcloudy meteorcloudy requested review from katre and removed request for a team November 13, 2023 10:19
@illicitonion
Copy link
Contributor

Can I introduce another motivation for something like this PR, and run it past you @comius?

I currently have a private ruleset which generates a shell script per test target, string-formatting in some paths and other strings from its attributes.

There are two major problems with this:

  1. I need to string-format the contents of each of these shell scripts, which means I have many more actions than otherwise needed, and consumes more memory (by having to do this string formatting at analysis time). I've seen this pattern repeated in numerous rulesets, where they generate a wrapper shell script only to string-format in a file path or two.
  2. The actions are actually shared with a _library-style rule (i.e. both the _library and the _test rules need to do the same string formatting into the same script). Because the outputs of these actions are platform-independent, we want to add support for --experimental_output_paths=strip to the _library targets, which means we need to use Args objects instead of string formatting. But because we can't use Args objects for the _test rule, we need to effectively keep two parallel collections - an Args object for the _library target, and a Sequence for the _test target (which is both inefficient and error-prone, as it introduces the possibility of adding a flag to one but not the other).

I think if this PR were modified to accept Args objects instead-of/in-addition-to Sequences (which should be ~trivial), it could improve this situation significantly: improving expressiveness, and decreasing both memory and runtime overhead.

WDYT @comius?

@illicitonion
Copy link
Contributor

@comius can I nudge you at ^^?

RunfilesSupport.withExecutable(ruleContext, computedDefaultRunfiles, executable);
if (starlarkArgsProvided) {
runfilesSupport = RunfilesSupport.withExecutableNoArgs(
ruleContext, computedDefaultRunfiles, executable);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the argument against creating a RunfilesSupport.withExecutableAndArgs() method that overrides the argv computed from the attributes of the rule? That looks simpler than plumbing another set of arguments to RunCommand / TestActionBuilder and carefully making sure that the one in RunEnvironmentInfo overrides the one in RunfilesSupport.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fmeum ^^

@lberki
Copy link
Contributor

lberki commented Apr 19, 2024

In the absence of @comius I did my best.

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 7, 2024

I think if this PR were modified to accept Args objects instead-of/in-addition-to Sequences (which should be ~trivial), it could improve this situation significantly: improving expressiveness, and decreasing both memory and runtime overhead.

@illicitonion I wonder whether your use case would be better served by a Starlark Args "implementations" that you can feed to your _test rule as a drop-in replacement for ctx.actions.args(). Something like this (missing add_all):

def string_args():
    args = []
    def add(o):
        if type(o) == "File":
            args.append(o.path)
        else:
            args.append(str(o))

    def build():
        return list(args)

    return struct(
        add = add,
        build = build,
    )

Keeping Args in providers doesn't feel ideal: There are some features for which it's not clear how to support them (tree artifact expansion). Actions also don't really store Args, but StarlarkCustomCommandLine objects, which would require some special constructor behavior for the provider.

@brentleyjones
Copy link
Contributor

@fmeum
Copy link
Collaborator Author

fmeum commented Jun 7, 2024

I proposed a plan on #12313 and want to at least dedicate some time to making progress on it before pursuing making arguments configurable from rule implementation functions.

I do share @comius concern that my PR could end up making args parsing less predictable and fragmented across the ecosystem ("how many levels of quotation do I need in this context?"). One way to address this could be to have RunEnvironmentInfo#arguments prepend to rather than outright replace args. I need to put more thought into this though as a key motivation for me is to ensure proper interaction with wrapping rules.

@illicitonion
Copy link
Contributor

I think if this PR were modified to accept Args objects instead-of/in-addition-to Sequences (which should be ~trivial), it could improve this situation significantly: improving expressiveness, and decreasing both memory and runtime overhead.

@illicitonion I wonder whether your use case would be better served by a Starlark Args "implementations" that you can feed to your _test rule as a drop-in replacement for ctx.actions.args().

If I'm understanding this right, I think this is basically equivalent to giving Args a .to_string_sequence() function, and requiring it be called before populating the value on RunEnvironmentInfo (but possibly doing so in a way which requires up-front opting in to this, rather than just adding a general to_string_sequence() to Args which may be otherwise problematic)?

If so, then yeah, the two pieces of:

  • Adding an arguments field on RunEnvironmentInfo which accepts a sequence of strings
  • Some way of constructing or converting an Args so that it can produce a sequence of strings usable in this context

works for my use-case.

@comius comius removed their request for review June 11, 2024 08:59
@comius
Copy link
Contributor

comius commented Jun 11, 2024

I'll let @lberki handle this.

@comius comius removed the team-Rules-API API for writing rules/aspects: providers, runfiles, actions, artifacts label Jun 11, 2024
@fmeum
Copy link
Collaborator Author

fmeum commented Jan 8, 2025

Another use case for this came up in the Bazel slack (https://bazelbuild.slack.com/archives/CA31HN1T3/p1736328720399229): A user wants to expose a bazel runnable target that flashes a certain binary onto some external hardware. They have a custom rule that depends on the flasher tool (transitioned to @platform//host) and the binary to flash (using the target configuration so that --platforms influences what the binary is built for). But writing that custom rule is pretty hard without RunEnvironmentInfo#arguments: On Windows, they need to emit a bash script that passes the runfiles path of the binary to the flasher tool, but that requires discovering and parsing the runfiles manifest in batch.

With RunEnvironmentInfo#arguments, they would only need to append the short_path of the binary to the flasher args and could have the actual flasher binary handle runfiles discovery (e.g. in Bash if the flasher is an sh_binary, for which there is a runfiles library).

@comius and my main concern about obscuring how args end up being tokenized can be alleviated by making it so that args are still appended to the arguments passed in via RunEnvironmentInfo#arguments in the same way as before. This effectively makes the current PR orthogonal to #12313 (ignoring that someone could add a custom untokenized_args attribute to their rule to work around that issue, but that's as explicit as it gets).

@lberki What do you think of this? Should I go ahead and update this PR to use this precedence behavior?

@jfirebaugh
Copy link

I want to write a wrapper for oci_image so that you can write the following:

go_binary(
    name = "binary",
    srcs = ["main.go"],
    args = [
        "arg1",
        "arg2",
    ],
    data = ["data.txt"],
    env = {"FOO": "bar"},
)

oci_image_wrapper(
  name = "image",
  binary = ":binary"
)

The wrapper should:

  • Include the binary output and all its runfiles in an image layer (possible)
  • Include the binary env values in the image env (possible via RunEnvironmentInfo)
  • Set the entrypoint array of the image to the appropriate binary path, followed by the binary args (not currently possible)

Would this PR support the last requirement?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-response Awaiting a response from the author
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add arguments parameter to testing.TestEnvironment
10 participants