Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Environ shim #1147

Closed
wants to merge 4 commits into from
Closed

Environ shim #1147

wants to merge 4 commits into from

Conversation

pvdrz
Copy link
Contributor

@pvdrz pvdrz commented Jan 15, 2020

The Rust side of this is at rust-lang/rust#68259

Makes progress towards #756

r? @RalfJung

if ecx.machine.communicate {
// Put each environment variable pointer in `EnvVars`, collect pointers.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Put each environment variable pointer in `EnvVars`, collect pointers.
// Put each environment variable pointer in `EnvVars`, collect pointers.

src/machine.rs Outdated
@@ -280,6 +287,9 @@ impl<'mir, 'tcx> Machine<'mir, 'tcx> for Evaluator<'tcx> {
let data = vec![0; size.bytes() as usize];
Allocation::from_bytes(&data, tcx.data_layout.pointer_align.abi)
}
"environ" => {
memory_extra.environ.as_ref().unwrap().clone()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this should not be called twice, you can use

Suggested change
memory_extra.environ.as_ref().unwrap().clone()
memory_extra.environ.take().unwrap()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm but then what should happen if you set another variable? I haven't thought about that part yet

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^^ that is handled by miri. You only need to provide the initial value here, afterwards you have a perfectly viable mutable allocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you mean? shouldn't we need to update environ every time (un)setenv is called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I gotta run, so just an excercise and no explanation: go up the call stack from here (you will have to go up a few call frames into the miri engine in rustc) and check out what happens with your allocation and why this code is even triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also see https://doc.rust-lang.org/nightly/nightly-rustc/rustc_mir/interpret/trait.Machine.html#tymethod.find_foreign_static: this function is only called the first time a foreign static is accessed.

@oli-obk I wonder... would it make sense to expand/change the find_foreign_static type so that it can/must return an AllocId? Having to actually return the allocation here is a bit annoying for environ.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would mean we'd end up with two AllocIds for the same allocation. I guess that happens anyway for immutable statics and the initial value of mutable statics, unsure. Maybe we should rather pass in the AllocId and require find_foreign_static to put the allocation into Memory at that alloc id. We'd need an allocate_with(AllocId, ...) function, but then everything else could just work as intended without creating confusing situations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm... we could pull the same trick as with "normal" statics, and treat one AllocId as "hidden"? We have two AllocId there and it's working...

I don't think allocate_with(alloc_id) will work; we cannot change what that ID maps to in the global tcx store as that's immutable -- right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All non interned allocations need to be in the memory's alloc_map, which is the first thing that is always queried. If there's an entry there, no further code is looked at. Theoretically we could proactively insert that allocation at memory creation time instead of the current "lazy" (it's just lazily inserting, the creation is eager) scheme.

Copy link
Contributor Author

@pvdrz pvdrz Jan 31, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem with this (using Option::take) is that we would need &mut MemoryExtra as a parameter and that would require changing a lot of code. Maybe we can add interior mutability to that field instead?

let place = ecx.mplace_field(environ_place, idx as u64).unwrap();
ecx.write_scalar(var, place.into()).unwrap();
}
ecx.memory.mark_immutable(environ_place.ptr.assert_ptr().alloc_id).unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this prevent later uses of set_env?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the question is, do we try to be smart and recycle those lists, or do we just allocate a new one on each change?

But either way, we'll have to construct such a list multiple times -- so it probably should have its own function.

ecx.memory.mark_immutable(environ_place.ptr.assert_ptr().alloc_id).unwrap();
// A pointer to that place corresponds to the `environ` static.
let environ_ptr = ecx.force_ptr(environ_place.ptr).unwrap();
let environ_alloc = ecx.memory.get_raw(environ_ptr.alloc_id).unwrap().clone();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is suboptimal as the allocation will be left in Memory forever even if it's never going to get used. I'm not sure if this is feasible, but you may be able to create the alloc with Allocation::undef(size, align) and then write to it directly instead of going through all the extra accessors above.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having the allocation last forever makes sense to me, it's a static after all.

We'll also need to update this allocation when the environment changes (whenever environ_place above gets reallocated).

}

impl MemoryExtra {
pub fn new(rng: StdRng, validate: bool, tracked_pointer_tag: Option<PtrId>) -> Self {
pub fn new(
rng: StdRng, validate: bool,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's some odd formatting... either all arguments should be on one line, or they should all have their own line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have rustfmt disabled to avoid huge diffs. I'll do the formatting when I have a more concise PR

@RalfJung
Copy link
Member

There are two allocations whose lifecycle we'll have to manage:

  • One for the list of pointers to individual environment variables. Let's call this environ_list. I propose the EnvVars struct keeps track of the latest allocation, and it should get a method to update that list which will kill the old allocation and create a new one.
  • And one for the pointer to the latest list -- this is the static itself. Let's call this environ_ptr. find_foreign_static will be called exactly once to allocate this. Can we make the environ_list update function construct a suitable DefId so that it can just do Memory::get_mut for this allocation, and implicitly benefit from the fact that only the first call will end up in find_foreign_static? That would be the most elegant way, I think.

@pvdrz
Copy link
Contributor Author

pvdrz commented Jan 31, 2020

There are two allocations whose lifecycle we'll have to manage:

* One for the list of pointers to individual environment variables. Let's call this `environ_list`. I propose the `EnvVars` struct keeps track of the latest allocation, and it should get a method to update that list which will kill the old allocation and create a new one.

I can implement that quickly. I'll add an update_list method to regenerate the whole environ_list and I'll call it every time we add/remove a variable.

* And one for the pointer to the latest list -- this is the static itself. Let's call this `environ_ptr`. `find_foreign_static` will be called exactly once to allocate this. Can we make the `environ_list` update function construct a suitable `DefId` so that it can just do `Memory::get_mut` for this allocation, and implicitly benefit from the fact that only the first call will end up in `find_foreign_static`? That would be the most elegant way, I think.

I got a little lost so let me see if I understood correctly: you want to preserve the same AllocId and just change its allocation when calling update_list?

On the other hand what's blocking this from compiling is that the resulting allocation will have tags and extra but find_foreign_static should return an untagged allocation, and just trying to fix Memory to use the tagged allocation fails at runtime (I did a rebase so I cannot show you the error right now :( ). Maybe I could add a method to untag an allocation?

@RalfJung
Copy link
Member

Maybe I could add a method to untag an allocation?

That sounds very wrong. find_foreign_static is supposed to return the initial content of that allocation, it is never called more than once per allocation. Maybe init_foreign_static would be a better name.

I got a little lost so let me see if I understood correctly: you want to preserve the same AllocId and just change its allocation when calling update_list?

The environ_ptr AllocId stays the same -- it has to. The environ_list AllocId does not. Each time update_list gets called, a new allocation is created with the new list, and a ptr to that allocation has to be written to some well-known location so that anyone can find it -- that location is environ_ptr (the storage backing the environ static).

The annoying and tricky part is that with the current structure, we have a phase separation: there's "before find_foreign_static gets called for environ" and "after find_foreign_static got called". That's what makes things messy. So the best thing would be if we could get rid of that phase separation, by calling it ourselves when initializing the env stuff.

@oli-obk is there any way we can determine the AllocId of the extern static named environ, somehow? Then we could just access that in update_list.

@oli-obk
Copy link
Contributor

oli-obk commented Jan 31, 2020

@oli-obk is there any way we can determine the AllocId of the extern static named environ, somehow?

yes, you call tcx.alloc_map.lock().create_static_alloc(def_id) (https://doc.rust-lang.org/nightly/nightly-rustc/rustc/mir/interpret/struct.AllocMap.html#method.create_static_alloc) which gives you the AllocId (and creates it if it didn't exist yet).

@oli-obk
Copy link
Contributor

oli-obk commented Jan 31, 2020

oh wait, that is naming

sorry I misread.. No this is something miri has to do on its own.

@pvdrz
Copy link
Contributor Author

pvdrz commented Jan 31, 2020

Maybe I could add a method to untag an allocation?

That sounds very wrong. find_foreign_static is supposed to return the initial content of that allocation, it is never called more than once per allocation. Maybe init_foreign_static would be a better name.

I rewrote find_foreign_static to do the initialization for __cxa_thread_atexit_impl, that required adding the AllocId as a parameter. Now miri is panicking running every single test, idk if this is my fault or not

stacktrace

@RalfJung
Copy link
Member

RalfJung commented Feb 1, 2020

Note: when inserting long text, please make it foldable with something like

<details><summary>TITLE HERE</summary>

long text here

</details>

"long text" can be wrapped in ``` if you make sure to leave the empty lines before and after it.


I rewrote find_foreign_static to do the initialization for __cxa_thread_atexit_impl

I don't follow, what is there to initialize? It's all NULL, isn't it?


@oli-obk I don't think our current scheme can work unless we have a way to access an extern static by link_name (and all alternatvies I can come up with are awful). So it'd be worth finding this out for sure.

To use create_static_alloc, we need to create a DefId for an extern static by name. It seems like that's only possible if the extern static is actually imported anywhere in the program, and then we have to reference it by its Rust path? For environ that could actually work, but in general this is a serious limitation.

Also what happens when there are multiple extern static with the same link_name, wouldn't we treat those as distinct globals in Miri which seems wrong?

@pvdrz
Copy link
Contributor Author

pvdrz commented Feb 1, 2020

I don't follow, what is there to initialize? It's all NULL, isn't it?

Yes but the static needs to be tagged properly.

@oli-obk
Copy link
Contributor

oli-obk commented Feb 1, 2020

We need some sort of name to alloc ID map in the memory extradata, I see no way around that. find_foreign_static could fill that in lazily

@RalfJung
Copy link
Member

RalfJung commented Feb 2, 2020

But even with something like that, Miri would treat two different imports of the same extern static as having different addresses...
Well, address equality is a mess anyway, so maybe that's okay for a start.

Okay so the plan then would be for find_foreign_static, instead of returning an Allocation, to return an AllocId -- right? But, looking at get_static_alloc and the surrounding code, I still don't see how we can make multiple different DefId all refer to the same allocation...

@oli-obk
Copy link
Contributor

oli-obk commented Feb 3, 2020

Hmmm... ok now I see what you mean. I guess we could make create_static_alloc check for foreign items and add a second hash map that is based on names, but I'm not sure how that would work cross crate.

While we could create a scheme similar to lang items, that feels a lot like overkill

@RalfJung
Copy link
Member

RalfJung commented Feb 4, 2020

I don't see how that would even help. We need multiple entries in memory's MemoryMap to point to the same Allocation, which is just not representible currently.

Or we say that we just don't support programs that import the same extern static at multiple places. But even then we need to rework some things -- as @christianpoveda learned the hard way, the current setup means that the initial value of an "extern static" cannot contain pointers, as it is untagged and gets retrofitted with default tags outside of the Machine's control.

Or we need an indirection that "canonicalizes" the AllocId before doing the lookup... I first thought this was crazy but it might not be so bad after all; in particular this would make a great machine hook! So I think that would be my proposal, actually:

  • Add a canonical_alloc_id function to the machine hook (that roughly takes an AllocId and returns an AllocId, with the default impl being the identity function), and remove find_foreign_static.
  • In Miri, implement canonicalization as follows: if the AllocId is an extern static, do a lookup it a HashMap that we keep to figure out the canonical AllocId for this static. We can fill that HashMap either eagerly on machine startup or lazily, I am not sure what is best.

@RalfJung
Copy link
Member

RalfJung commented Feb 5, 2020

@oli-obk gave 👍, so I guess we can go ahead with this plan.

@christianpoveda do you want to do that? If you have questions about the proposal just ask. :)

@pvdrz
Copy link
Contributor Author

pvdrz commented Feb 6, 2020

I have a lot of questions but I'll do them when I have played with this a little bit. Sounds good to me :P

@RalfJung
Copy link
Member

@christianpoveda I am going to implement what I proposed above; I got some time this week-end.

@RalfJung
Copy link
Member

Once rust-lang/rust#69408 lands, this should be unblocked. :)

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this pull request Feb 28, 2020
Miri: let machine canonicalize AllocIDs

This implements the rustc side of the plan I laid out [here](rust-lang/miri#1147 (comment)).

Miri PR: rust-lang/miri#1190
bors added a commit to rust-lang/rust that referenced this pull request Mar 1, 2020
Miri: let machine canonicalize AllocIDs

This implements the rustc side of the plan I laid out [here](rust-lang/miri#1147 (comment)).

Miri PR: rust-lang/miri#1190
@RalfJung
Copy link
Member

RalfJung commented Mar 2, 2020

@christianpoveda okay, the groundwork is laid. You should now be able to allocate environ_ptr in env.rs initialization and then in init_extern_statics add it to the extern_static map.

@pvdrz
Copy link
Contributor Author

pvdrz commented Mar 2, 2020

Ok let me rebase this thing to get it working

Edit: Actually it might be easier to do a new PR, I'll be back soon.

@pvdrz pvdrz closed this Mar 2, 2020
@pvdrz pvdrz deleted the environ-shim branch March 2, 2020 15:21
@pvdrz pvdrz mentioned this pull request Mar 4, 2020
bors added a commit that referenced this pull request Mar 8, 2020
Environ shim

Remake of #1147. There are three main problems with this:

1. For some reason `update_environ` is not updating `environ` when `setenv` or `unsetenv` are called. Even then it works during initialization.
2. I am not deallocating the old array with the variables in `update_environ`.
3. I had to store the `environ` place into `MemoryExtra` as a field to update it. I was thinking about changing `extern_statics` to store places instead of `AllocID`s to avoid this.

@RalfJung
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants