Parameterize the index type of maps and sets #147

cuviper · 2020-08-08T04:27:12Z

The index type is a new generic parameter defaulting to usize:

pub struct IndexSet<T, S = RandomState, Idx = usize>
pub struct IndexMap<K, V, S = RandomState, Idx = usize>

In no_std builds, S has no default, but still Idx = usize.

This may be used to optimized the storage size in the raw table, for
example using a u32 index if you know that u32::MAX capacity is
sufficient for you. It may also be useful to define a newtype index
custom to a particular domain, which only needs to implement the new
Indexable trait to be useful here.

bluss · 2021-01-01T13:10:33Z

Nice idea and pretty smooth implementation.

With the into/from usize conversion, we insert a lot more panic sites in the code. And I'd like to ensure we can't panic in places that leave the map inconsistent. (Is this a too onerous goal?)

The first usize conversion in .pop() looks like a problem. But it is actually guarded (global invariant) if we just ensure the map length never grows larger than what the index size can handle. However, it is not apparent if/how the insert method ensures this.
(This assumes a faithful fancy index to usize conversion - an unfaithful one can of course panic anyway, because invariants like that are useless.)

Then we document that insert panics if the index type's limits are exceeded etc. (And .extend() and .collect() et.c. 🙁 I'm just sad that it gets ugly in the details.)

malthe · 2021-01-23T08:00:38Z

@bluss it's not clear to me that a smaller index type needs special treatment – after all, the current logic also panics if an allocation can't be met (for whatever reason, possibly capacity > usize).

bluss · 2021-01-23T09:58:41Z

@malthe That's too hand-wavy, the exact line and place where we can panic matters for which inconsistencies can show up in the data structure.

malthe · 2021-01-23T20:26:16Z

@bluss but if we panic, what's left of the program? Is it not just going to halt?

Another issue I found was that it feels inconsistent that len() should return a usize when the index parameter is something different.

bluss · 2021-01-23T23:49:51Z

Panic is recoverable, so that's why it matters.

malthe · 2021-02-17T07:23:50Z

@cuviper I have begun some rebasing against the current master on this one, but let me know if you already have something cooking.

bluss · 2021-02-17T08:55:48Z

Len should probably continue to return usize, just to note - without looking deeper.

cuviper · 2021-03-03T20:33:33Z

@malthe Sorry, I don't know how far you got, but I just pushed a rebase myself. I was responsible for a lot of the conflicting changes anyway, so I had a good idea how to solve things.

I have mixed feelings whether the length and capacity should be usize or Idx. As usize, there's a small bonus that the maximum like 255u8 is a usable index, since we can still report length 256. As Idx, we would at least need Indexable to tell us its maximum value, so we can saturate when RawTable over-allocates. (e.g. RawTable::with_capacity(224) will allocate 256 and report capacity 224 exactly, but with_capacity(225) allocates 512 and reports capacity 448 -- too high for Idx = u8.) It might make some of those Idx <-> usize panics easier to deal with if the maximum is a sentinel though.

malthe · 2021-03-03T20:38:34Z

@cuviper yeah I think it has to be usize – but perhaps there could be a function that returns a range, i.e. 0..length - 1 which then would be type-safe on the index type. Thanks for the rebasing, there were so many conflicts to resolve.

The index type is a new generic parameter defaulting to `usize`: pub struct IndexSet<T, S = RandomState, Idx = usize> pub struct IndexMap<K, V, S = RandomState, Idx = usize> In `no_std` builds, `S` has no default, but still `Idx = usize`. This may be used to optimized the storage size in the raw table, for example using a `u32` index if you know that `u32::MAX` capacity is sufficient for you. It may also be useful to define a newtype index custom to a particular domain, which only needs to implement the new `Indexable` trait to be useful here.

Looking for real-world results on indexmap-rs/indexmap#147.

cuviper · 2022-05-07T00:32:46Z

To get some real data, I tried to use this branch in rust-lang/rust#96751 and its perf infrastructure. For instructions, there were a few small wins, but nothing better than 1% improvement, and one case regressed 1.68%. Maybe that's a good thing that it was mostly neutral, since this has added usize conversions in a lot of places. But max-rss didn't improve either, just looks like noise.

It's possible that I just didn't make good use of it, but those were all the places I could find that clearly dealt with smaller indexes. Maybe there are more significant maps that would be happy limited to u32::MAX items, just as the compiler often does with its IndexVec.

Overall, I'm not inclined to merge this if there's no clear winner, as the new parameter is pretty invasive.

cuviper force-pushed the indexable branch 2 times, most recently from b61fdd4 to 33d9b31 Compare August 9, 2020 18:08

cuviper mentioned this pull request Aug 9, 2020

Upgrade indexmap and use it more rust-lang/rust#75278

Merged

cuviper force-pushed the indexable branch from 33d9b31 to 9450d36 Compare March 3, 2021 19:58

cuviper mentioned this pull request Sep 21, 2021

IndexMap / IndexSet comparison isn't order-aware. #153

Closed

cuviper mentioned this pull request Feb 9, 2022

Prepare for indexmap 2.0 #219

Merged

cuviper force-pushed the indexable branch from 9450d36 to 9fab75e Compare March 30, 2022 00:28

cuviper added 3 commits March 29, 2022 17:55

make Indexable: TryFrom<usize> + TryInto<usize>

f8bc314

Fix MutableKeys::get_index_mut2 for Idx

c57e46a

Guard some index conversions that aren't supposed to panic out-of-bounds

3b6d807

cuviper mentioned this pull request Apr 15, 2022

Updating to hashbrown 0.12.0 #217

Closed

cuviper added a commit to cuviper/rust that referenced this pull request May 6, 2022

WIP: Try custom index types for indexmap

570bbc5

Looking for real-world results on indexmap-rs/indexmap#147.

cuviper mentioned this pull request May 6, 2022

WIP: Try custom index types for indexmap rust-lang/rust#96751

Closed

cuviper closed this Jul 17, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameterize the index type of maps and sets #147

Parameterize the index type of maps and sets #147

cuviper commented Aug 8, 2020

bluss commented Jan 1, 2021 •

edited

Loading

malthe commented Jan 23, 2021

bluss commented Jan 23, 2021

malthe commented Jan 23, 2021

bluss commented Jan 23, 2021

malthe commented Feb 17, 2021

bluss commented Feb 17, 2021

cuviper commented Mar 3, 2021

malthe commented Mar 3, 2021 •

edited

Loading

cuviper commented May 7, 2022

Parameterize the index type of maps and sets #147

Parameterize the index type of maps and sets #147

Conversation

cuviper commented Aug 8, 2020

bluss commented Jan 1, 2021 • edited Loading

malthe commented Jan 23, 2021

bluss commented Jan 23, 2021

malthe commented Jan 23, 2021

bluss commented Jan 23, 2021

malthe commented Feb 17, 2021

bluss commented Feb 17, 2021

cuviper commented Mar 3, 2021

malthe commented Mar 3, 2021 • edited Loading

cuviper commented May 7, 2022

bluss commented Jan 1, 2021 •

edited

Loading

malthe commented Mar 3, 2021 •

edited

Loading