-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement get_disjoint_mut (previously get_many_mut) #238
base: main
Are you sure you want to change the base?
Conversation
0ef0e90
to
d3f1968
Compare
I think ideally we would want two things for this, especially the first:
I suspect part of this is because you should be using a raw pointer to offset/index each of the elements, otherwise you have the main |
Yeah, this makes a lot of sense. Is it okay to just leave the PR open until then?
Haven't seen this before but it would indeed fit pretty well. It seems to have been open for quite a while without resolution though. Also, if they opt for a result-based return value, that'd not match the hashmap api (as it is now) so we'd have to do conversions in the implementation.
Yeah, exactly, the lambda captures the whole array lifetime mutably and can't be invoked multiple times. I did not consider pointer arithmetic though, I'll experiment a bit and see how it looks. Probably similar to the |
src/map.rs
Outdated
// SAFETY: OK to discard mutable borrow lifetime as each index is unique | ||
#[allow(unsafe_code)] | ||
unsafe { | ||
&mut *(&mut entries[i].value as *mut V) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still creates aliasing &mut
between entries
and the previous iterations, which is not allowed. I suggest using self.as_entries_mut().as_mut_ptr()
to start with a pointer, then &mut (*entries_ptr.add(i)).value
for each index.
We probably need a manual bounds-check too, but that can be part of the loop scanning for duplicates. In theory, get_index_of
should always be in-bounds, but better safe than sorry...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how it can create aliasing &mut
for entries
. I only get a &mut
to a specific element of entries
(only the value member of that element) and while that references has a lifetime of entries
, it is not mutably borrowing all of entries
, no? Maybe I'm misunderstand reference/lifetime semantics and what is allowed. Using the pointer doesn't affect the guarantee of the function: each &mut
reference that comes out of this function is unique so what difference will it make?
We probably need a manual bounds-check too...
Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, I did run miri on the tests like:
# rustup +nightly component add miri
cargo clean
cargo +nightly miri test
but I know miri doesn't catch all cases of undefined behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, maybe I understand. The variable entries
is &mut [Bucket<K, V>]
so it is a mutable reference the whole slice but the array out
also holds mutable references to individual elements, which alias with entries
. Is this what you are referring to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's it. I'm a little surprised that miri doesn't see any problem, but I think it's because slice indexing is an intrinsic operation in MIR, so miri doesn't really see any use of the entire entries
. If you force it to go through the IndexMut
trait, entries.index_mut(i)
, then it does complain:
test map::tests::many_mut_multi_success ... error: Undefined Behavior: trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
--> /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cmp.rs:1531:27
|
1531 | PartialEq::eq(*self, *other)
| ^^^^^
| |
| trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
| this error occurs as part of a reborrow at alloc159164[0xc..0x10]
|
= help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
= help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <390898> was created by a retag at offsets [0xc..0x10]
--> src/map.rs:502:19
|
502 | let out = indices.map(|i| {
| ___________________^
503 | | // SAFETY: OK to discard mutable borrow lifetime as each index is unique
504 | | #[allow(unsafe_code)]
505 | | unsafe {
506 | | &mut *(&mut entries.index_mut(i).value as *mut V)
507 | | }
508 | | });
| |__________^
help: <390898> was later invalidated at offsets [0x0..0x40]
--> src/map.rs:506:29
|
506 | &mut *(&mut entries.index_mut(i).value as *mut V)
| ^^^^^^^^^^^^^^^^^^^^
= note: backtrace: [...]
--> src/map/tests.rs:449:5
|
449 | assert_eq!(map.get_many_mut([&1, &1123]), Some([&mut 10, &mut 100]));
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We probably need a manual bounds-check too...
Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?
Yeah, if get_index_of
returns a bad index, then we have internal errors, so a panic is fine.
Yeah, that's fine. I should probably make a label for this... |
Can IndexMap add it under a nightly or unstable opt-in cargo feature? Then people will have to opt in to breaking changes, and you can document that changes to this API won't result in a new major version. (Or if you don't feel strongly, you can just make a new major release if the HashMap API changes for some reason.) |
I don't like using nightly/unstable features, in part because it's an extra maintenance burden, but also because the feature may be opted-in on your behalf, arbitrarily deep in your dependency tree. I'd rather not set us up for needing a semver bump either. If this is something that folks really need, let's apply that pressure on |
Rust 1.86 (now in beta) will stabilize this as For pub fn get_disjoint_mut<I, const N: usize>(
&mut self,
indices: [I; N],
) -> Result<[&mut <I as SliceIndex<[T]>>::Output; N], GetDisjointMutError>
where
I: GetDisjointMutIndex + SliceIndex<[T]>, For pub fn get_disjoint_mut<Q, const N: usize>(
&mut self,
ks: [&Q; N],
) -> [Option<&mut V>; N]
where
K: Borrow<Q>,
Q: Hash + Eq + ?Sized, We can decide for ourselves whether it makes sense to build on the standard library, or (unsafely) implement our own. @NiklasJonsson it's been some time -- are you still interested in working on this? |
Either way, I'd be fine with supporting just |
Yeah I've been following the stabilization and I'd like to finish this up.
I'll have some time this weekend to have a deeper look but I think I'll
favor supporting less to start out and then I'll see if I have time to do
more.
…On Tue, Feb 25, 2025, 02:29 Josh Stone ***@***.***> wrote:
Rust 1.86 (now in beta) will stabilize this as get_disjoint_mut with two
different styles:
For [T]:
https://doc.rust-lang.org/beta/std/primitive.slice.html#method.get_disjoint_mut
pub fn get_disjoint_mut<I, const N: usize>(
&mut self,
indices: [I; N],) -> Result<[&mut <I as SliceIndex<[T]>>::Output; N], GetDisjointMutError>where
I: GetDisjointMutIndex + SliceIndex<[T]>,
For HashMap<K, V>:
https://doc.rust-lang.org/beta/std/collections/struct.HashMap.html#method.get_disjoint_mut
pub fn get_disjoint_mut<Q, const N: usize>(
&mut self,
ks: [&Q; N],) -> [Option<&mut V>; N]where
K: Borrow<Q>,
Q: Hash + Eq + ?Sized,
We can decide for ourselves whether it makes sense to build on the
standard library, or (unsafely) implement our own. GetDisjointMutIndex
won't be stable though, nor the old SliceIndex, so if we want to
replicate that support for both usize and ranges, then I think we'll be
on our own anyway.
@NiklasJonsson <https://github.com/NiklasJonsson> it's been some time --
are you still interested in working on this?
—
Reply to this email directly, view it on GitHub
<#238 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADNRXP6VM3HKI4SGOGBRNOD2RPBONAVCNFSM6AAAAABXZNCFMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGEZDQMRTGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
[image: cuviper]*cuviper* left a comment (indexmap-rs/indexmap#238)
<#238 (comment)>
Rust 1.86 (now in beta) will stabilize this as get_disjoint_mut with two
different styles:
For [T]:
https://doc.rust-lang.org/beta/std/primitive.slice.html#method.get_disjoint_mut
pub fn get_disjoint_mut<I, const N: usize>(
&mut self,
indices: [I; N],) -> Result<[&mut <I as SliceIndex<[T]>>::Output; N], GetDisjointMutError>where
I: GetDisjointMutIndex + SliceIndex<[T]>,
For HashMap<K, V>:
https://doc.rust-lang.org/beta/std/collections/struct.HashMap.html#method.get_disjoint_mut
pub fn get_disjoint_mut<Q, const N: usize>(
&mut self,
ks: [&Q; N],) -> [Option<&mut V>; N]where
K: Borrow<Q>,
Q: Hash + Eq + ?Sized,
We can decide for ourselves whether it makes sense to build on the
standard library, or (unsafely) implement our own. GetDisjointMutIndex
won't be stable though, nor the old SliceIndex, so if we want to
replicate that support for both usize and ranges, then I think we'll be
on our own anyway.
@NiklasJonsson <https://github.com/NiklasJonsson> it's been some time --
are you still interested in working on this?
—
Reply to this email directly, view it on GitHub
<#238 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADNRXP6VM3HKI4SGOGBRNOD2RPBONAVCNFSM6AAAAABXZNCFMWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMOBQGEZDQMRTGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Sounds great, thanks! And please don't take that as any pressure to do this quickly -- we can take our time and get it right. |
Correcting myself -- this should be plural, so now it would be |
@cuviper Just to verify that I understand what you mean. An alright initial implementation would be:
and then we can leave the implementation of |
9f800a0
to
c6f35fd
Compare
I went ahead and implemented what I understood you to mean 🙂 Also added some extra tests. |
src/map.rs
Outdated
/// let mut map = indexmap::IndexMap::from([(1, 'a'), (3, 'b'), (2, 'c')]); | ||
/// assert_eq!(map.get_disjoint_mut([&2, &1]), Some([&mut 'c', &mut 'a'])); | ||
/// ``` | ||
pub fn get_disjoint_mut<Q, const N: usize>(&mut self, keys: [&Q; N]) -> Option<[&mut V; N]> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To match HashMap
, this should move the Option
inward, returning [Option<&mut V>; N]
. Unfortunately, the slice version is all-or-nothing though, so we can't directly rely on that, but we can probably have some internal method doing the unsafe
part for either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, HashMap
panics if there's any overlap. The bounds-check panic you have is fine too, since that shouldn't happen normally, but possibly could if we have a logic error somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aah right, I totally missed this part of the signature. I'll study the API in more detail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed now that indexmap
uses the Equivalent<K>
bound rather than Eq
+ Borrow
like the stdlib hashmap. I assume that I should keep using Equivalent
for this function as well? As far as I understand, that only makes indexmap
more flexible than the stdlib hashmap API. Everything the Eq
+ Borrow
combo supports should also be supported with Equivalence
. Am I correct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's right, keep using the Equivalent
bound.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so we can't directly rely on that, but we can probably have some internal method doing the unsafe part for either way.
The internal helper ended up like this:
pub(crate) fn get_disjoint_opt_mut<const N: usize>(
&mut self,
indices: [Option<usize>; N],
) -> Result<[Option<(&K, &mut V)>; N], GetDisjointMutError> {
and does most of the heavy-lifting for both of the methods. Does it seem alright to you? I don't feel great about wrapping indices in Some and then unwrapping after calling the helper in Slice::get_disjoint_mut
but it does make the outer functions share most of the tricky code.
src/map.rs
Outdated
pub fn get_disjoint_indices_mut<const N: usize>( | ||
&mut self, | ||
indices: [usize; N], | ||
) -> Option<[(&K, &mut V); N]> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To mimic the slice version, but without the generic usize
/range output, this should return Result<[(&K, &mut V); N], GetDisjointMutError>
. It will have to be our own version of that error for now, but maybe in the future we'll have indexmap v3
where we can switch to std
's.
It would be nice to have this method on map::Slice
too -- perhaps primarily implemented there and then forwarded from here with as_mut_slice()
.
src/map.rs
Outdated
let out = indices.map(|i| { | ||
// SAFETY: The base pointer is valid as it comes from a slice and the deref is always | ||
// in-bounds as we've already checked the indices above. | ||
#[allow(unsafe_code)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think this allow
should go around the entire method. While the actual unsafe
operation is narrow, the SAFETY requirements need care in the whole thing, and the allowed lint is a good signal for that.
3ec4a1b
to
11017c9
Compare
std::collections::HashMap
is addingget_many_mut
in rust-lang/rust#97601 and I'm working on replacing some uses of that hash map with this one where some of the code usesget_many_mut
(currently only a single location in rustc) so I would appreciate it ifindexmap::IndexMap
could provide the same interface.This code uses a decent amount of unsafe compared to the rest of the file and I saw the comment in lib.rs on having almost all unsafe code in raw.rs but I couldn't figure out how to move parts of my implementation there as it is mostly dealing with initializing the array. I tried to use
array::map
to avoid all unsafe but couldn't quite get it to work with the lifetimes and the lambda.If you want this change, I'll also write some docs.