Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic FxHashSet wrapper #94188

Closed
wants to merge 4 commits into from

Conversation

hameerabbasi
Copy link
Contributor

@hameerabbasi hameerabbasi commented Feb 20, 2022

@rust-highfive
Copy link
Collaborator

r? @nagisa

(rust-highfive has picked a reviewer for you, use r? to override)

@rustbot rustbot added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Feb 20, 2022
@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Feb 20, 2022
@hameerabbasi
Copy link
Contributor Author

r? @michaelwoerister

@hameerabbasi
Copy link
Contributor Author

Right now it seems that StableSet doesn't implement Decodable. I wonder if it might be worth it to implement HashStable for FxHashSet directly, and remove the implementation for HashSet.

@hameerabbasi
Copy link
Contributor Author

I had a couple of questions:

  1. Decodable also seems to require Hash and not HashStable. Is that an issue here?
  2. I suspect I might come across this issue in multiple places, how do I decide for each one?

@rust-log-analyzer

This comment has been minimized.

@michaelwoerister
Copy link
Member

I wonder if it might be worth it to implement HashStable for FxHashSet directly, and remove the implementation for HashSet.

FxHashSet is just an alias for the regular HashSet (with a different hash algorithm), so it should not implement HashStable either.

Decodable also seems to require Hash and not HashStable. Is that an issue here?

It's not a problem for a type to implement both Hash and HashStable. You should be able just #[derive(Encodable, Decodable)] for StableSet. We'll definitely need to be able encode and decode these for incremental compilation.

@hameerabbasi
Copy link
Contributor Author

It's not a problem for a type to implement both Hash and HashStable.

Ah, since it was a wrapper I assumed we wanted to drop some methods and traits, but wasn't sure which ones.

@rust-log-analyzer

This comment has been minimized.

@hameerabbasi
Copy link
Contributor Author

The current problem is that some places require .iter(), I recognize that as one of the things we'd like to avoid in StableSet. I have a few suggestions, maybe you have better ones:

  1. Add an unsafe function iter(), or call it iter_unstable() to avoid performance loss.
  2. Use the sorted_vector()'s into_iter().

For point 2, I have a question: Where does one create or acquire HCX (I assume that's hashing context).

@bjorn3
Copy link
Member

bjorn3 commented Feb 21, 2022

Where does one create or acquire HCX (I assume that's hashing context).

Using either tcx.create_stable_hashing_context or tcx.create_no_span_stable_hashing_context. The TyCtxt type is not available in rustc_data_structure, so the hashing context will have to be passed in as argument.

@rust-log-analyzer

This comment has been minimized.

@hameerabbasi
Copy link
Contributor Author

I have tried in vain to grok through the layers of macros here, I'm not entirely sure where the last compilation error is coming from. I've tried to grep for HashSet and FxHashSet in all of the files in that traceback, but nothing was found.

@bjorn3
Copy link
Member

bjorn3 commented Feb 22, 2022

You are probably looking for https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/query/mod.rs

@michaelwoerister
Copy link
Member

Yeah, those errors are really hard to decipher. @bjorn3 is right: https://github.com/rust-lang/rust/blob/master/compiler/rustc_middle/src/query/mod.rs defines all queries in the compiler. All query keys and results must implement HashStable -- and HashSet (which includes FxHashSet) doesn't anymore. So all occurrences of those must be replaced with StableSet in that file. Likewise, the "query prodivers" (i.e. the actual implementations of the queries defined in that file) must be update accordingly.

@michaelwoerister
Copy link
Member

The current problem is that some places require .iter(), I recognize that as one of the things we'd like to avoid in StableSet. I have a few suggestions, maybe you have better ones:

In some cases I've seen, iter() is only used in a way that does not actually expose the order of the items in the set, e.g. my_set.iter().any(|x| x > 0). We could add those methods directly to StableSet. any() and all() come to mind.

Also, adding extend() to StableSet should be fine.

@rust-log-analyzer

This comment has been minimized.

@hameerabbasi
Copy link
Contributor Author

The current compilation problem is that we need an implementation of ArenaAllocatable for StableSet. One is already provided for all types that impl IntoIterator, but we don't on purpose. We could either provide a bunch of implementations manually, or better, have a trait that all of our eventual types will implement. I like the second idea better, and I believe it'll work well even for HashMaps.

@rust-log-analyzer

This comment has been minimized.

@hameerabbasi
Copy link
Contributor Author

Where does one create or acquire HCX (I assume that's hashing context).

Using either tcx.create_stable_hashing_context or tcx.create_no_span_stable_hashing_context. The TyCtxt type is not available in rustc_data_structure, so the hashing context will have to be passed in as argument.

Another open question: Which one to use where? I could avoid compilation errors with either but I don't want to code in something that won't pass tests, or worse, cause silent bugs.

@bors
Copy link
Contributor

bors commented Feb 24, 2022

☔ The latest upstream changes (presumably #94129) made this pull request unmergeable. Please resolve the merge conflicts.

@michaelwoerister
Copy link
Member

Aren't we using pointers as keys in a number of places?

@bjorn3
Copy link
Member

bjorn3 commented Mar 30, 2022

Looks like we do for Interned.

@hameerabbasi
Copy link
Contributor Author

I think the safe way to compare them would be to simply compare the contents of the pointers. Maybe what we need isn't really a wrapper -- But instead just to:

  • Implement a custom version of HashStable for pointer-like objects such as Interned, as it's recursive, it might fix the problems we have instead of a wrapper.
  • Fix the serialization of FxHashSet and FxHashMap.

@bjorn3
Copy link
Member

bjorn3 commented Mar 30, 2022

Implement a custom version of HashStable for pointer-like objects such as Interned, as it's recursive, it might fix the problems we have instead of a wrapper.

HashStable is completely irrelevant for hashmap/hashset iteration order. They use Hash instead. Interned's HashStable implementation is already completely deterministic. Hash is not for performance reasons.

Fix the serialization of FxHashSet and FxHashMap.

That doesn't fix the iteration order problem.

@michaelwoerister
Copy link
Member

Let me clarify: The argument for making the serialized version independent of iteration order is only about deterministic builds, not about incremental compilation. That is, if we have two compilation sessions that both serialize a set {a, b, c} into crate metadata, we could end up with any permutation of that set in the binary stream if we do serialization via just encoding in the order that .iter() gives us, because {a, b, c} could be in any order internally when using pointers as keys.

Iteration order for FxHashSet is only dependent on the exact set of operations performed on it. There is no randomness introduced. Serializing an FxHashSet in iteration order into the crate metadata is perfectly deterministic.

I agree. There's an interesting (but not immediately relevant) subtlety there: As far as I know, FxHashSet, and SwissTable-based hash maps in general, do not give a guarantee that some_set has the same iteration order as some_set.into_iter().collect::<FxHashSet>(), so serializing to a sequence and then collecting the sequence might reshuffle things, even when no pointers are involved. But, yes, I don't think that that is necessarily a problem for deterministic builds. It's mostly pointer-like hash keys that I'm worried about.

@michaelwoerister
Copy link
Member

I think it would be good to come up with an API design that allows us to safely iterate over items without exposing the order and without causing additional runtime overhead. I don't know yet, what that would look like exactly. I'll need to do some experimentation in order to get a feel for it.

I did some experimentation, and this is what I came up with:
https://gist.github.com/michaelwoerister/ca40793a8d68c8ee60ce9af49bea89d3

It basically adds an unordered version of Iterator that only has methods that don't depend on the order of items (like map). This way we can merge data from one StableSet into another StableSet (possibly doing transformations on the data in between), as in the example:

let mut src: StableMap<u32, u32> = Default::default();
src.insert(1, 2);

let mut dst: StableMap<u64, u64> = Default::default();
dst.extend(src.items().map(|(&k, &v)| (k as u64 + 1, v as u64 + 1)))

The hope is that this allows us to avoid unnecessarily sorting things if they end up in an unsorted context afterwards anyway. Does that make sense?

@bjorn3
Copy link
Member

bjorn3 commented Apr 4, 2022

Does that make sense?

Yes!

@hameerabbasi
Copy link
Contributor Author

Okay, I've rebased and added the suggested API. I'm attempting to mirror parts of the API as I need them, but I have to change over some occurrences first.

@rust-log-analyzer
Copy link
Collaborator

The job mingw-check failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    Checking rustc_ast_lowering v0.0.0 (/checkout/compiler/rustc_ast_lowering)
    Checking rustc_ast_passes v0.0.0 (/checkout/compiler/rustc_ast_passes)
    Checking rustc_expand v0.0.0 (/checkout/compiler/rustc_expand)
    Checking rustc_builtin_macros v0.0.0 (/checkout/compiler/rustc_builtin_macros)
error[E0277]: the trait bound `rustc_data_structures::stable_set::StableSet<_>: From<&mut HashSet<rustc_span::Symbol, BuildHasherDefault<FxHasher>>>` is not satisfied
    --> compiler/rustc_middle/src/ty/context.rs:2955:10
2955 |         &StableSet::from(
2955 |         &StableSet::from(
     |          ^^^^^^^^^^^^^^^ the trait `From<&mut HashSet<rustc_span::Symbol, BuildHasherDefault<FxHasher>>>` is not implemented for `rustc_data_structures::stable_set::StableSet<_>`
     = help: the following implementations were found:
     = help: the following implementations were found:
               <rustc_data_structures::stable_set::StableSet<T> as From<HashSet<T, BuildHasherDefault<FxHasher>>>>
For more information about this error, try `rustc --explain E0277`.
error: could not compile `rustc_middle` due to previous error
warning: build failed, waiting for other jobs to finish...
error: could not compile `rustc_middle` due to previous error

@@ -910,17 +910,14 @@ pub fn provide(providers: &mut Providers) {
};

let (defids, _) = tcx.collect_and_partition_mono_items(cratenum);
for id in &*defids {
if defids.any(|x| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if defids.any(|x| {
if defids.any(|id| {

@bors
Copy link
Contributor

bors commented Apr 9, 2022

☔ The latest upstream changes (presumably #95524) made this pull request unmergeable. Please resolve the merge conflicts.

@apiraino
Copy link
Contributor

apiraino commented Apr 28, 2022

Based on this comment, I'll switch to waiting on author to implement suggested changes. Please feel free to request a review when ready, thanks!

@rustbot -S-waiting-on-review +S-waiting-on-author

@apiraino apiraino added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 28, 2022
@Dylan-DPC
Copy link
Member

@hameerabbasi any updates on this?

@hameerabbasi
Copy link
Contributor Author

I think it's okay if someone else takes over after this weekend. I'll try to resolve the comments already posted.

@hameerabbasi
Copy link
Contributor Author

It seems StableSet and StableMap were removed, any alternatives I should be aware of? Or should I add them back?

@michaelwoerister
Copy link
Member

Hi, after we recently ran into a critical issue that could have been prevented by this, I decided to try and make some progress more quickly. I opened rust-lang/compiler-team#533, which suggests a replacement for StableMap and StableSet, very much like I proposed above (#94188 (comment)). This is mostly implemented in michaelwoerister@694136d (although I'm going to rename UnorderedMap to UnordMap, as eddyb suggested elsewhere). I'll try to make a PR this week.

If you are still interested in this topic, @hameerabbasi, I can ping you when the new collection types have been merged. Then we can start using them in more cases. But I would suggest doing it in smaller pieces at a time.

I suggest closing this particular PR -- although I do think the work that has been done here was very important in exploring the design space!

@hameerabbasi
Copy link
Contributor Author

@michaelwoerister Feel free to cc me when the PR is ready! I'll follow along from the sidelines, and probably leave a question/comment here and there.

Just so you're aware, my time to work on FOSS in general is limited to about 4-6 hours a week, due to a day job, a move to another country and the procedures surrounding that. If you feel slow progress is still useful progress, I'd like to spend time on this.

@michaelwoerister
Copy link
Member

Sounds good, @hameerabbasi!

@Dylan-DPC
Copy link
Member

Closing this based on the comments above

@Dylan-DPC Dylan-DPC closed this Oct 28, 2022
@michaelwoerister
Copy link
Member

FYI, the replacement PR is up at #102698.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.