-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shard the STRINGS in the intern cache for better parallel scalability #163
Conversation
(Another issue, perhaps, is the use of the same hash function both to choose a shard and then for inserting strings to the hash table. That could presumably potentially lead to an unexpectedly high number of hash collisions in a worst case scenario.) |
I agree, I'd rather not increase the size of |
Just leaving an idea here for now: You could "inline" < 3 byte strings into symbol pretty easily, which would require no synchronization for most variables (since most are very short). Redefine EDIT: ah nevermind, this makes it impossible to implement |
Ok nvm I figured it out. Could you try this branch instead on your workload? If your symbols are < 3 bytes, then they should be super fast! https://github.com/egraphs-good/egg/tree/tiny-symbols |
Nice! I just got around to trying this with my workload; to my surprise it doesn't seem to give a performance benefit, FWIW. |
Thanks! Interesting that there is no speedup. I’ll try to merge something like this approach then. Are your variable names all long or something? |
Ah, yes-for the example I was testing with, it happened that they were all >4 characters. Trying another one that has a mix of short ones and a few longer ones, I see a 10% reduction in overall wallclock time, so this is a definite win for that case, since it does a bunch of other stuff besides call into egg. |
Great! The latest commit on tiny-symbols now implements both of these techniques, and it keeps the Symbol size to 4 bytes :). Barring any crazy results from you, I'll merge that and close this PR. |
Hi Matt! I've taken this idea and combined it with something inspired by this post and made the Commit ed1faf1 (and the upcoming 0.8 release) will use this for string interning, which should solve your perf issues once and for all! |
(This is an improvement to the approach in #162 that is, however, probably not yet ready to merge. Posting it for discussion.)
The 20% of runtime in
egg::util::intern
with #162 is still pretty high. This PR instead shards the cache to reduce mutex contention in the presence of multithreading. With it, in the context of my application, it further reduces the percentage of time inintern()
by a factor of 4 with a corresponding benefit to overall performance:As it stands currently, this PR unfortunately increases the size of
Symbol
to 8 bytes, which is probably not desirable. One option might be to hijack the high bits of the value inSymbol
to store the shard index, if that would be acceptable.