Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameterize the index type of maps and sets #147

Closed
wants to merge 4 commits into from

Conversation

cuviper
Copy link
Member

@cuviper cuviper commented Aug 8, 2020

The index type is a new generic parameter defaulting to usize:

pub struct IndexSet<T, S = RandomState, Idx = usize>
pub struct IndexMap<K, V, S = RandomState, Idx = usize>

In no_std builds, S has no default, but still Idx = usize.

This may be used to optimized the storage size in the raw table, for
example using a u32 index if you know that u32::MAX capacity is
sufficient for you. It may also be useful to define a newtype index
custom to a particular domain, which only needs to implement the new
Indexable trait to be useful here.

@bluss
Copy link
Member

bluss commented Jan 1, 2021

Nice idea and pretty smooth implementation.

With the into/from usize conversion, we insert a lot more panic sites in the code. And I'd like to ensure we can't panic in places that leave the map inconsistent. (Is this a too onerous goal?)

The first usize conversion in .pop() looks like a problem. But it is actually guarded (global invariant) if we just ensure the map length never grows larger than what the index size can handle. However, it is not apparent if/how the insert method ensures this.
(This assumes a faithful fancy index to usize conversion - an unfaithful one can of course panic anyway, because invariants like that are useless.)

Then we document that insert panics if the index type's limits are exceeded etc. (And .extend() and .collect() et.c. 🙁 I'm just sad that it gets ugly in the details.)

@malthe
Copy link

malthe commented Jan 23, 2021

@bluss it's not clear to me that a smaller index type needs special treatment – after all, the current logic also panics if an allocation can't be met (for whatever reason, possibly capacity > usize).

@bluss
Copy link
Member

bluss commented Jan 23, 2021

@malthe That's too hand-wavy, the exact line and place where we can panic matters for which inconsistencies can show up in the data structure.

@malthe
Copy link

malthe commented Jan 23, 2021

@bluss but if we panic, what's left of the program? Is it not just going to halt?

Another issue I found was that it feels inconsistent that len() should return a usize when the index parameter is something different.

@bluss
Copy link
Member

bluss commented Jan 23, 2021

Panic is recoverable, so that's why it matters.

@malthe
Copy link

malthe commented Feb 17, 2021

@cuviper I have begun some rebasing against the current master on this one, but let me know if you already have something cooking.

@bluss
Copy link
Member

bluss commented Feb 17, 2021

Len should probably continue to return usize, just to note - without looking deeper.

@cuviper
Copy link
Member Author

cuviper commented Mar 3, 2021

@malthe Sorry, I don't know how far you got, but I just pushed a rebase myself. I was responsible for a lot of the conflicting changes anyway, so I had a good idea how to solve things.

I have mixed feelings whether the length and capacity should be usize or Idx. As usize, there's a small bonus that the maximum like 255u8 is a usable index, since we can still report length 256. As Idx, we would at least need Indexable to tell us its maximum value, so we can saturate when RawTable over-allocates. (e.g. RawTable::with_capacity(224) will allocate 256 and report capacity 224 exactly, but with_capacity(225) allocates 512 and reports capacity 448 -- too high for Idx = u8.) It might make some of those Idx <-> usize panics easier to deal with if the maximum is a sentinel though.

@malthe
Copy link

malthe commented Mar 3, 2021

@cuviper yeah I think it has to be usize – but perhaps there could be a function that returns a range, i.e. 0..length - 1 which then would be type-safe on the index type. Thanks for the rebasing, there were so many conflicts to resolve.

The index type is a new generic parameter defaulting to `usize`:

    pub struct IndexSet<T, S = RandomState, Idx = usize>
    pub struct IndexMap<K, V, S = RandomState, Idx = usize>

In `no_std` builds, `S` has no default, but still `Idx = usize`.

This may be used to optimized the storage size in the raw table, for
example using a `u32` index if you know that `u32::MAX` capacity is
sufficient for you. It may also be useful to define a newtype index
custom to a particular domain, which only needs to implement the new
`Indexable` trait to be useful here.
cuviper added a commit to cuviper/rust that referenced this pull request May 6, 2022
@cuviper
Copy link
Member Author

cuviper commented May 7, 2022

To get some real data, I tried to use this branch in rust-lang/rust#96751 and its perf infrastructure. For instructions, there were a few small wins, but nothing better than 1% improvement, and one case regressed 1.68%. Maybe that's a good thing that it was mostly neutral, since this has added usize conversions in a lot of places. But max-rss didn't improve either, just looks like noise.

It's possible that I just didn't make good use of it, but those were all the places I could find that clearly dealt with smaller indexes. Maybe there are more significant maps that would be happy limited to u32::MAX items, just as the compiler often does with its IndexVec.

Overall, I'm not inclined to merge this if there's no clear winner, as the new parameter is pretty invasive.

@cuviper cuviper closed this Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants