Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement get_disjoint_mut (previously get_many_mut) #238

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

NiklasJonsson
Copy link

std::collections::HashMap is adding get_many_mut in rust-lang/rust#97601 and I'm working on replacing some uses of that hash map with this one where some of the code uses get_many_mut (currently only a single location in rustc) so I would appreciate it if indexmap::IndexMap could provide the same interface.

This code uses a decent amount of unsafe compared to the rest of the file and I saw the comment in lib.rs on having almost all unsafe code in raw.rs but I couldn't figure out how to move parts of my implementation there as it is mostly dealing with initializing the array. I tried to use array::map to avoid all unsafe but couldn't quite get it to work with the lifetimes and the lambda.

If you want this change, I'll also write some docs.

@NiklasJonsson NiklasJonsson force-pushed the get_many_mut branch 2 times, most recently from 0ef0e90 to d3f1968 Compare July 14, 2022 19:53
@cuviper
Copy link
Member

cuviper commented Jul 15, 2022

I think ideally we would want two things for this, especially the first:

  1. Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.
  2. Stabilizing <[T]>::get_many_mut (still new in Add slice methods for indexing via an array of indices. rust-lang/rust#83608) to deal with all the unsafe aspects. We might also want to wrap that into our own get_many_index_mut, just like we have get_index and get_index_mut for slice-like indexing.

I tried to use array::map to avoid all unsafe but couldn't quite get it to work with the lifetimes and the lambda.

I suspect part of this is because you should be using a raw pointer to offset/index each of the elements, otherwise you have the main &mut [T] aliasing each of the &mut T you pull out while you're working. And of course, using raw pointers will still need unsafe to form the final references.

@NiklasJonsson
Copy link
Author

Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.

Yeah, this makes a lot of sense. Is it okay to just leave the PR open until then?

Stabilizing <[T]>::get_many_mut

Haven't seen this before but it would indeed fit pretty well. It seems to have been open for quite a while without resolution though. Also, if they opt for a result-based return value, that'd not match the hashmap api (as it is now) so we'd have to do conversions in the implementation.

I suspect part of this is because you should be using a raw pointer to offset/index each of the elements, otherwise you have the main &mut [T] aliasing each of the &mut T you pull out while you're working. And of course, using raw pointers will still need unsafe to form the final references.

Yeah, exactly, the lambda captures the whole array lifetime mutably and can't be invoked multiple times. I did not consider pointer arithmetic though, I'll experiment a bit and see how it looks. Probably similar to the <[T]>::get_many_mut examples 🙂

src/map.rs Outdated
// SAFETY: OK to discard mutable borrow lifetime as each index is unique
#[allow(unsafe_code)]
unsafe {
&mut *(&mut entries[i].value as *mut V)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still creates aliasing &mut between entries and the previous iterations, which is not allowed. I suggest using self.as_entries_mut().as_mut_ptr() to start with a pointer, then &mut (*entries_ptr.add(i)).value for each index.

We probably need a manual bounds-check too, but that can be part of the loop scanning for duplicates. In theory, get_index_of should always be in-bounds, but better safe than sorry...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand how it can create aliasing &mut for entries. I only get a &mut to a specific element of entries (only the value member of that element) and while that references has a lifetime of entries, it is not mutably borrowing all of entries, no? Maybe I'm misunderstand reference/lifetime semantics and what is allowed. Using the pointer doesn't affect the guarantee of the function: each &mut reference that comes out of this function is unique so what difference will it make?

We probably need a manual bounds-check too...

Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, I did run miri on the tests like:

# rustup +nightly component add miri
cargo clean
cargo +nightly miri test

but I know miri doesn't catch all cases of undefined behaviour.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, maybe I understand. The variable entries is &mut [Bucket<K, V>] so it is a mutable reference the whole slice but the array out also holds mutable references to individual elements, which alias with entries. Is this what you are referring to?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's it. I'm a little surprised that miri doesn't see any problem, but I think it's because slice indexing is an intrinsic operation in MIR, so miri doesn't really see any use of the entire entries. If you force it to go through the IndexMut trait, entries.index_mut(i), then it does complain:

test map::tests::many_mut_multi_success ... error: Undefined Behavior: trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
    --> /home/jistone/.rustup/toolchains/nightly-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/cmp.rs:1531:27
     |
1531 |             PartialEq::eq(*self, *other)
     |                           ^^^^^
     |                           |
     |                           trying to reborrow <390898> for SharedReadOnly permission at alloc159164[0xc], but that tag does not exist in the borrow stack for this location
     |                           this error occurs as part of a reborrow at alloc159164[0xc..0x10]
     |
     = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
     = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <390898> was created by a retag at offsets [0xc..0x10]
    --> src/map.rs:502:19
     |
502  |           let out = indices.map(|i| {
     |  ___________________^
503  | |             // SAFETY: OK to discard mutable borrow lifetime as each index is unique
504  | |             #[allow(unsafe_code)]
505  | |             unsafe {
506  | |                 &mut *(&mut entries.index_mut(i).value as *mut V)
507  | |             }
508  | |         });
     | |__________^
help: <390898> was later invalidated at offsets [0x0..0x40]
    --> src/map.rs:506:29
     |
506  |                 &mut *(&mut entries.index_mut(i).value as *mut V)
     |                             ^^^^^^^^^^^^^^^^^^^^
     = note: backtrace: [...]
    --> src/map/tests.rs:449:5
     |
449  |     assert_eq!(map.get_many_mut([&1, &1123]), Some([&mut 10, &mut 100]));
     |     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We probably need a manual bounds-check too...

Yeah that makes sense if pointers are the way to go but it should still panic rather than return None, I assume?

Yeah, if get_index_of returns a bad index, then we have internal errors, so a panic is fine.

@cuviper
Copy link
Member

cuviper commented Jul 15, 2022

Is it okay to just leave the PR open until then?

Yeah, that's fine. I should probably make a label for this...

@cuviper cuviper added the waiting-for-std Changes waiting for stabilization in the standard library, so we can match the API. label Jul 15, 2022
@jyn514
Copy link

jyn514 commented Jul 16, 2022

I think ideally we would want two things for this, especially the first: Stabilizing HashMap::get_many_mut so we can be sure to match that API as much as possible.

Can IndexMap add it under a nightly or unstable opt-in cargo feature? Then people will have to opt in to breaking changes, and you can document that changes to this API won't result in a new major version.

(Or if you don't feel strongly, you can just make a new major release if the HashMap API changes for some reason.)

@cuviper
Copy link
Member

cuviper commented Jul 19, 2022

I don't like using nightly/unstable features, in part because it's an extra maintenance burden, but also because the feature may be opted-in on your behalf, arbitrarily deep in your dependency tree.

I'd rather not set us up for needing a semver bump either. If this is something that folks really need, let's apply that pressure on std stabilization. :)

@cuviper
Copy link
Member

cuviper commented Feb 25, 2025

Rust 1.86 (now in beta) will stabilize this as get_disjoint_mut with two different styles:

For [T]: https://doc.rust-lang.org/beta/std/primitive.slice.html#method.get_disjoint_mut

pub fn get_disjoint_mut<I, const N: usize>(
    &mut self,
    indices: [I; N],
) -> Result<[&mut <I as SliceIndex<[T]>>::Output; N], GetDisjointMutError>
where
    I: GetDisjointMutIndex + SliceIndex<[T]>,

For HashMap<K, V>: https://doc.rust-lang.org/beta/std/collections/struct.HashMap.html#method.get_disjoint_mut

pub fn get_disjoint_mut<Q, const N: usize>(
    &mut self,
    ks: [&Q; N],
) -> [Option<&mut V>; N]
where
    K: Borrow<Q>,
    Q: Hash + Eq + ?Sized,

We can decide for ourselves whether it makes sense to build on the standard library, or (unsafely) implement our own. GetDisjointMutIndex won't be stable though, nor the old SliceIndex, so if we want to replicate that support for both usize and ranges, then I think we'll be on our own anyway.

@NiklasJonsson it's been some time -- are you still interested in working on this?

@cuviper
Copy link
Member

cuviper commented Feb 25, 2025

We can decide for ourselves whether it makes sense to build on the standard library, or (unsafely) implement our own. GetDisjointMutIndex won't be stable though, nor the old SliceIndex, so if we want to replicate that support for both usize and ranges, then I think we'll be on our own anyway.

Either way, I'd be fine with supporting just usize to start with, as long as there's a possibility to add ranges later. But even that could be separate, like how we have get_index_mut and get_ranges_mut covering the same as slice get_mut (and our get_mut matches HashMap). In other words, we could leave that extension to a future get_disjoint_ranges_mut.

@NiklasJonsson
Copy link
Author

NiklasJonsson commented Feb 25, 2025 via email

@cuviper
Copy link
Member

cuviper commented Feb 25, 2025

Sounds great, thanks! And please don't take that as any pressure to do this quickly -- we can take our time and get it right.

@cuviper
Copy link
Member

cuviper commented Feb 25, 2025

2. We might also want to wrap that into our own get_many_index_mut,

Correcting myself -- this should be plural, so now it would be get_disjoint_indices_mut.

@NiklasJonsson
Copy link
Author

NiklasJonsson commented Mar 1, 2025

@cuviper Just to verify that I understand what you mean. An alright initial implementation would be:

  1. The key-based get_disjoint_mut:
    pub fn get_disjoint_mut<Q, const N: usize>(
        &mut self,
        ks: [&Q; N],
    ) -> [Option<&mut V>; N]
    where
        K: Borrow<Q>,
        Q: Hash + Eq + ?Sized,
  2. A get_disjoint_indices_mut that takes a usize array (pretty much the current get_many_index_mut)

and then we can leave the implementation of get_disjoint_mut with ranges/for a future PR? If so, then it seems to be mostly about renaming the functions.

@NiklasJonsson NiklasJonsson changed the title Implement get_many_mut Implement get_disjoint_mut (previously get_many_mut) Mar 1, 2025
@NiklasJonsson
Copy link
Author

NiklasJonsson commented Mar 1, 2025

Just to verify that I understand what you mean. An alright initial implementation would be:
...

I went ahead and implemented what I understood you to mean 🙂 Also added some extra tests.

src/map.rs Outdated
/// let mut map = indexmap::IndexMap::from([(1, 'a'), (3, 'b'), (2, 'c')]);
/// assert_eq!(map.get_disjoint_mut([&2, &1]), Some([&mut 'c', &mut 'a']));
/// ```
pub fn get_disjoint_mut<Q, const N: usize>(&mut self, keys: [&Q; N]) -> Option<[&mut V; N]>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To match HashMap, this should move the Option inward, returning [Option<&mut V>; N]. Unfortunately, the slice version is all-or-nothing though, so we can't directly rely on that, but we can probably have some internal method doing the unsafe part for either way.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, HashMap panics if there's any overlap. The bounds-check panic you have is fine too, since that shouldn't happen normally, but possibly could if we have a logic error somewhere.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah right, I totally missed this part of the signature. I'll study the API in more detail.

Copy link
Author

@NiklasJonsson NiklasJonsson Mar 1, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed now that indexmap uses the Equivalent<K> bound rather than Eq + Borrow like the stdlib hashmap. I assume that I should keep using Equivalent for this function as well? As far as I understand, that only makes indexmap more flexible than the stdlib hashmap API. Everything the Eq + Borrow combo supports should also be supported with Equivalence. Am I correct?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that's right, keep using the Equivalent bound.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so we can't directly rely on that, but we can probably have some internal method doing the unsafe part for either way.

The internal helper ended up like this:

pub(crate) fn get_disjoint_opt_mut<const N: usize>(
        &mut self,
        indices: [Option<usize>; N],
) -> Result<[Option<(&K, &mut V)>; N], GetDisjointMutError> {

and does most of the heavy-lifting for both of the methods. Does it seem alright to you? I don't feel great about wrapping indices in Some and then unwrapping after calling the helper in Slice::get_disjoint_mut but it does make the outer functions share most of the tricky code.

src/map.rs Outdated
pub fn get_disjoint_indices_mut<const N: usize>(
&mut self,
indices: [usize; N],
) -> Option<[(&K, &mut V); N]> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To mimic the slice version, but without the generic usize/range output, this should return Result<[(&K, &mut V); N], GetDisjointMutError>. It will have to be our own version of that error for now, but maybe in the future we'll have indexmap v3 where we can switch to std's.

It would be nice to have this method on map::Slice too -- perhaps primarily implemented there and then forwarded from here with as_mut_slice().

src/map.rs Outdated
let out = indices.map(|i| {
// SAFETY: The base pointer is valid as it comes from a slice and the deref is always
// in-bounds as we've already checked the indices above.
#[allow(unsafe_code)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think this allow should go around the entire method. While the actual unsafe operation is narrow, the SAFETY requirements need care in the whole thing, and the allowed lint is a good signal for that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
waiting-for-std Changes waiting for stabilization in the standard library, so we can match the API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants