-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Environment variable sandboxing #2794
base: master
Are you sure you want to change the base?
Conversation
|
||
|
||
# Rationale and alternatives | ||
[rationale-and-alternatives]: #rationale-and-alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
env(1)
seems like a plausible alternative to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'll add it, and explicitly using the shell.
|
@lxrec:
A shell-based mechanism only applies if you're invoking rustc from a shell. If its some other build tool doing the invocation, then it may not have a mechanism to control the environment in that way.
There seems to be a few use-cases of using things like Or you may just want to block anything reading PATH entirely.
That's just for completeness because its hard to do negative matching in a regex. One could imagine that you might want to blacklist security sensitive variables, such as
Yes, it is primarily intended for tooling to generate these options (esp since rustc is rarely invoked directly anyway). I suggest a Cargo-based idea in the RFC, though my primary use-case is a non-Cargo environment. It doesn't make any real sense to set a variable then blacklist it, but it is important to define semantics for this case. I propose these because they're easy to describe and implement, but I'm open to other proposals if they make more sense. |
text/0000-sandbox-environment.md
Outdated
initialized from the process environment. Then each each `--env` option is processed in turn, as it appears, to update the logical | ||
environment. Specifically: | ||
|
||
- `--env-whitelist REGEX` - Any name which doesn't match the REGEX is removed from the logical environment, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the regular expression needed? Explicitly listing all individual variables seems not unreasonable.
Another question: does TEST
match NOT_TEST
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I chose regex for things like allowing all CARGO_ vars without having to enumerate all the features or whatever else cargo adds.
I mentioned that the regex is anchored so it implicitly has ^$ around it - so no TEST wouldn't match NOT_TEST.
text/0000-sandbox-environment.md
Outdated
|
||
These options are: | ||
- `--env-whitelist REGEX` - match the REGEX against all existing process environment | ||
variables and allow them to be seen. This overrides `--env-remove-all`. The regex is matched against the entire variable name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems that --env-remove-all
is not defined in the RFC?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
--env-blacklist .*
rust-analyzer will have to do something along this lines, once we get to supporting |
Yes I was thinking about the benefits of setting the env explicitly for persistent/embedded compiler instances. |
text/0000-sandbox-environment.md
Outdated
|
||
By default all environment variables are available with their value taken from the environment. There are several | ||
additional controls to control the logical environment accessed by `env!()`/`option_env!()`: | ||
- only allow access to a specific whitelist of variables |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this idea in general, and am broadly in favor of this RFC. I'll review it again later. As an initial review pass, though, could you please avoid using "blacklist" and "whitelist" (both in the RFC and in the option names), in favor of (for instance) "blocklist" and "safelist", or perhaps "block" and "allow"? See numerous references such as https://twitter.com/dhh/status/1032050325513940992 and https://twitter.com/andybohm/status/1038491107829530625 for rationale. Thank you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The two arguments take regexes, not lists, so calling them "lists" are misnomer anyway 🙃.
(If we call them --env-allow
and --env-deny
, I wonder if --env-warn
makes sense 😛)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allow/deny is consistent with other rustc terminology.
text/0000-sandbox-environment.md
Outdated
[drawbacks]: #drawbacks | ||
|
||
The primary cost is additional complexity in invoking `rustc` itself, and additional complexity in documenting | ||
`env!`/`option_env!`. Procedual macros would need to be changed to access the logical environment, either by |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
appears as though you forgot to write the rest of the sentence.
On Thu, Oct 31, 2019 at 02:13:33PM -0700, Jeremy Fitzhardinge wrote:
jsgf commented on this pull request.
> + likely needs to be set so that the compiler can execute its various components, but there's no way to override this
+ so that `env!("PATH")` returns something else. This would be necessary where the compilation environment differs from the
+ deployment environment (such as when cross-compiling).
+
+This RFC proposes a way to precisely control the environment visible to the compile-time macros, while defaulting to the
+current behaviour.
+
+# Guide-level explanation
+[guide-level-explanation]: #guide-level-explanation
+
+Rust implements the `env!()` and `option_env!()` macros to access the process environment variables at compilation time.
+`rustc` supports a number of command-line options to control the environment visible to the compiling code.
+
+By default all environment variables are available with their value taken from the environment. There are several
+additional controls to control the logical environment accessed by `env!()`/`option_env!()`:
+- only allow access to a specific whitelist of variables
Allow/deny is consistent with other rustc terminology.
Sounds good.
|
Much simplified, removing regexs
I've massively simplified this proposal. It removes all regexes from the command-line options, and has much simpler processing. Instead it proposes:
These options are processed in the order given. This allows for full control of the environment available within the crate being compiled. https://github.com/jsgf/rust/tree/env-sandbox implements this as proposed. |
ping @joshtriplett @matklad since you'd shown interest in this. At this point, I'm not sure there's enough here to need an RFC. Guidance would be appreciated. |
Do you think it makes sense to combine |
@jasonwhite The main advantage I see is that it's shorter to type, at the cost of being a bit less explicit. Since nobody directly types rustc command lines in practice, I don't think brevity is all that important. I don't feel too strongly about it either way though. The path extension would presumably make this |
POC implementation updated to current upstream master https://github.com/jsgf/rust/tree/env-sandbox |
cc @JakobDegen |
text/0000-sandbox-environment.md
Outdated
with no reference to the process environment. | ||
|
||
Note that this can't affect other environment accesses. For example, if a | ||
procedural macro uses the `std::env::var()` function as part of its |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the design questions around proc macros deserve their own section, with a proper discussion of the alternatives. I think we need to consider at least all of:
- We do magic to modify the behavior of
env::var()
in proc macros - We leave the behavior of
env::var()
as-is, but modify the behavior ofproc_macro::tracked_env
or any similar APIs that exist in the future - What bjorn suggested here
- Something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are semantically two types of getenv at compile time:
- intended to get the environment of the actual running rustc process, for something that's needed at runtime - eg getting
$PATH
to find an auxillary executable ("process environment") - environment logically substituted into the source being compiled - ie
env!()
etc. This is the only environment affected by this RFC ("logical environment")
A proc macro could be doing either, so rather than making env::var()
behave differently we'd need a new API to enable 2 in proc macros, and an audit of proc macros which fetch from the environment to see which ones they should call.
What bjorn suggested here
We should definitely discourage/block the use of env::set_var
regardless of anything else.
Looks like proc_macro_tracked_env
is pretty much the API I'm thinking of here.
Another point to consider is the environment of build scripts, both consumed and produced.
- for scripts doing probing (eg pkgconfig) then they definitely want the real process environment
- but I think
cargo:rustc-env=VAR=VALUE
is very likely to be intended to affect the logical environment
I'll add this discussion to the RFC.
- explicitly discuss the distinction between *operational* and *directive* environment fetches - more discussion about proc-macros, esp with respect to the proposed tracked environment API
Updated to include:
|
Thanks for the RFC @jsgf! We reviewed it during this week's compiler team steering meeting and had a few thoughts. The RFC suggests creating a distinction between the actual env vars for the rustc process (the "process environment") and the ones configured via the cli (the "logical environment"). Our feeling is that mutating the process environment after arguments have been parsed would be better from an implementation perspective as only the very early startup code in rustc will need to be aware of these options and the rest of the compiler does not need to be changed. It also seems that this approach would likely have fewer bugs where various parts of the compiler forgot to use the logical environment instead of the process environment. This option doesn't seem to be explored in the stated alternatives. Have you considered this approach? |
When `rustc` is embedded or long-running, such as in `rls` or `rust-analyzer`, | ||
then its necessary to explicitly set the logical environment for each crate, | ||
rather than just inheriting the process environment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@wesleywiser doesn't this section here cover that?
I did, but I rejected it pretty early so I didn't even consider it as a plausible alternative. There's a few reasons why that would be far from my favoured approach: Most importantly, from a semantic point of view, is that it doesn't allow setting the logical variable, as seen by While these may not actually be required by Concretely, Buck2 attempts to limit the number of "ambiently" set environment variables to a "very limited initial environment" which has since grown to 20 environment variables we've found to be required in our environment (and I have no reason to believe it will stop there). Many of these are ones we would like to prevent random code from including with Secondly, I'm also not sure to what extent rustc is usable as a library which can be instantiated multiple times within a process. Even if its not possible today, relying on process wide global state would make it more difficult in future. And other tools which want to reuse the same command line syntax, like rust-analyzer, would not be able to if the command line is filled with toolchain invocation detail. Furthermore, if we're planning on introducing the notion of a "tracked environment" anyway, then we need to do a careful audit of all the sites which read the environment and determine whether they should be tracked. I think "tracked environment" and the "logical environment" should be synonymous (though I haven't done deep analysis here). We very definitely don't want to be tracking environment variables which are implementation details of running the toolchain which don't have a material effect on the generated artifacts. |
Implement `--env` compiler flag (without `tracked_env` support) Part of rust-lang#80792. Implementation of rust-lang/compiler-team#653. Not an implementation of rust-lang/rfcs#2794. It adds the `--env` compiler flag option which allows to set environment values used by `env!` and `option_env!`. Important to note: When trying to retrieve an environment variable value, it will first look into the ones defined with `--env`, and if there isn't one, then only it will look into the environment variables. So if you use `--env PATH=a`, then `env!("PATH")` will return `"a"` and not the actual `PATH` value. As mentioned in the title, `tracked_env` support is not added here. I'll do it in a follow-up PR. r? rust-lang/compiler
Implement `--env` compiler flag (without `tracked_env` support) Part of rust-lang/rust#80792. Implementation of rust-lang/compiler-team#653. Not an implementation of rust-lang/rfcs#2794. It adds the `--env` compiler flag option which allows to set environment values used by `env!` and `option_env!`. Important to note: When trying to retrieve an environment variable value, it will first look into the ones defined with `--env`, and if there isn't one, then only it will look into the environment variables. So if you use `--env PATH=a`, then `env!("PATH")` will return `"a"` and not the actual `PATH` value. As mentioned in the title, `tracked_env` support is not added here. I'll do it in a follow-up PR. r? rust-lang/compiler
…avidtwco Rename `--env` option flag to `--env-set` As discussed [on zulip](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Stabilizing.20.60--env.60.20option.20flag.3F). We rename `--env` to not conflicting names with the [RFC](rust-lang/rfcs#2794). r? `@davidtwco`
Rollup merge of rust-lang#119884 - GuillaumeGomez:rename-env-opt, r=davidtwco Rename `--env` option flag to `--env-set` As discussed [on zulip](https://rust-lang.zulipchat.com/#narrow/stream/131828-t-compiler/topic/Stabilizing.20.60--env.60.20option.20flag.3F). We rename `--env` to not conflicting names with the [RFC](rust-lang/rfcs#2794). r? `@davidtwco`
The |
A possible extension would be to allow regular expressions to select which | ||
variable names should be removed from the logical environment or passed through | ||
from the process environment. For example, `--env-remove-re` or `--env-pass-re`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since special characters in environment variable names is so rare, maybe it could be OK to say --env-remove
and --env-pass
take regex rather than having separate options. I don't think it is a big deal to escape the patterns if you have a very weird env name.
Implicit ^...$
anchors of course, as mentioned elsewhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd still prefer to make it explicit just to avoid any edge cases.
Also if these parameters can match multiple env vars, then you need to worry about how the options interact (eg, if you want to remove all the env vars matching a pattern except one, or conversely pass all except one). That will require some more specific design and not something that just a simple or obvious extension of naming a specific variable.
These options are processed in order: | ||
1. `--env-clear` - remove all variables from the logical environment. | ||
1. `--env-remove VAR` - remove a specific variable from the logical environment. | ||
May be specified multiple times. | ||
1. `--env-pass VAR`- set a variable in the logical environment from the process | ||
environment. Ignored if the variable is not set, or is not UTF-8 encoded. | ||
1. `--env-set VAR=VALUE` - multiple `--env-set` options affecting the same variable are | ||
processed in command-line order, so later ones override earlier ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see it mentioned anywhere but a single-argument form with =
would be nice, --env-set=FOO=bar
.
Possibly also a short alias, -e
for --env-set
.
|
||
The options are processed in the order listed above (ie, clear, then remove, | ||
then pass, then set). `--env-clear`, `--env-remove` and `--env-pass` are | ||
idempotent, so multiple instances of each are the same as one.. Multiple |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
idempotent, so multiple instances of each are the same as one.. Multiple | |
idempotent, so multiple instances of each are the same as one. Multiple |
There are several command-line options to control this environment and change it | ||
from the default: | ||
- `--env-clear` - remove all process environment variables from the logical | ||
environment, leaving it completely empty. | ||
- `--env-remove VARIABLE` - Remove a specific variable from the logical | ||
environment. This is an exact match, and it does nothing if that variable is | ||
not set. | ||
- `--env-pass VARIABLE` - Pass a variable from the process environment to the | ||
logical one, even if it had previously been removed. This lets specific | ||
variables to be allow-listed without having to explicitly set their value. The | ||
variable is ignored if it is not set or not utf8 encoded. | ||
- `--env-set VARIABLE=VALUE` - Set a variable in the logical environment. This will | ||
either create a new variable, or override an existing value. | ||
|
||
The options are processed in the order listed above (ie, clear, then remove, | ||
then pass, then set). `--env-clear`, `--env-remove` and `--env-pass` are |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be better if these were always processed in the order they were passed. For example:
FOO=global rustc ... --env-set=FOO=override --env-remove=FOO --env-pass=FOO
My expectation is that env!("FOO")
here should be global
, but I think as described it would be override
. Similarly for clear, it looks like env!("FOO")
should fail with the below but I think it would be bar
:
rustc ... --env-set=FOO=bar --env-clear
Reasoning:
-
Much easier to understand, you just read
--env-*
arguments left to right -
More composable for build scripts since you can override your earlier settings. Roughly:
# default args args="--env-clear --env-set=PATH=/bin --env-set=DYLIB=true" # a long build script... [ "$use_static_linking" = "true" ] && args="$args --env-remove=DYLIB" [ "$cross_compiling" = "false" ] && args="$args --env-pass=PATH" # ...
-
Slightly more straightforward implementation - just a
BTreeMap
with the system env, then each--env-*
arg in order is an insert/remove/clear operation. No ordering of arguments needed
In any case, could you add some examples of CLI arguments and the resulting env!
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah I thought about that but it seemed like it would need some custom option parsing code to implement. And while I'm not adverse to the "read from left to right" argument, my pedantic side would actually prefer failing on multiple conflicting treatments of a specific variable rather than overriding. But I know pragmatically that the "keep adding args" pattern is very common for build rules, so maybe that's just too pedantic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the parser is already order-preserving, though can't say for certain.
If the decision is against making these composable, it could be reasonable to enforce the order of the args. That is, error if args aren't presented in --env-clear
, --env-remove
--env-pass
--env-set
order, even if it makes no difference, to enforce that they are always left to right readable.
Clearing of extra variables is an improvement, but I'm worried that it doesn't help catch untracked uses of optional env variables. This could lead to a confusing situation where the program doesn't seem to "see" its configuration variables. Could it instead fail hard on any attempt to read a variable that isn't explicitly allowed? |
So you mean you want
? That seems like a pretty large change in behaviour for
|
The RFC doesn't specifically say how Cargo would use it. What I'd like to see is Cargo crates listing all env vars they use, in some declarative machine-readable way. This would help outer build systems and containers track, configure, and cache the variables accordingly. It would also allow docs.rs to list possible configuration options. So my suggestion was from that perspective of forcing crates to declare everything they use. If a crate used an
So what I'm proposing is breaking builds on purpose, and of course would need an opt in, and likely some cargo fix tool that gathers all uses of env vars for the initial allow list. One problem I see with explicit allow list is names that are computed dynamically, e.g. cross-build vars containing the target triple. Or pkg-config crate looking up env var overrides for specific libraries it only gets at build/run time. That may need support for wildcard patterns. |
Yeah, I'm sympathetic to what you're saying, but I was really hoping to avoid changing the semantics of |
Implement `--env` compiler flag (without `tracked_env` support) Part of rust-lang/rust#80792. Implementation of rust-lang/compiler-team#653. Not an implementation of rust-lang/rfcs#2794. It adds the `--env` compiler flag option which allows to set environment values used by `env!` and `option_env!`. Important to note: When trying to retrieve an environment variable value, it will first look into the ones defined with `--env`, and if there isn't one, then only it will look into the environment variables. So if you use `--env PATH=a`, then `env!("PATH")` will return `"a"` and not the actual `PATH` value. As mentioned in the title, `tracked_env` support is not added here. I'll do it in a follow-up PR. r? rust-lang/compiler
Implement `--env` compiler flag (without `tracked_env` support) Part of rust-lang/rust#80792. Implementation of rust-lang/compiler-team#653. Not an implementation of rust-lang/rfcs#2794. It adds the `--env` compiler flag option which allows to set environment values used by `env!` and `option_env!`. Important to note: When trying to retrieve an environment variable value, it will first look into the ones defined with `--env`, and if there isn't one, then only it will look into the environment variables. So if you use `--env PATH=a`, then `env!("PATH")` will return `"a"` and not the actual `PATH` value. As mentioned in the title, `tracked_env` support is not added here. I'll do it in a follow-up PR. r? rust-lang/compiler
What remains to be done here? |
Rendered
First draft proposal to add mechanism to precisely control environment available to the
env!
/option_env!
macros.Implementation at https://github.com/jsgf/rust/tree/env-sandbox