Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shard the STRINGS in the intern cache for better parallel scalability #163

Closed
wants to merge 1 commit into from

Conversation

mmp
Copy link
Contributor

@mmp mmp commented Jan 20, 2022

(This is an improvement to the approach in #162 that is, however, probably not yet ready to merge. Posting it for discussion.)

The 20% of runtime in egg::util::intern with #162 is still pretty high. This PR instead shards the cache to reduce mutex contention in the presence of multithreading. With it, in the context of my application, it further reduces the percentage of time in intern() by a factor of 4 with a corresponding benefit to overall performance:

Screen Shot 2022-01-20 at 12 18 16 PM

As it stands currently, this PR unfortunately increases the size of Symbol to 8 bytes, which is probably not desirable. One option might be to hijack the high bits of the value in Symbol to store the shard index, if that would be acceptable.

@mmp
Copy link
Contributor Author

mmp commented Jan 20, 2022

(Another issue, perhaps, is the use of the same hash function both to choose a shard and then for inserting strings to the hash table. That could presumably potentially lead to an unexpectedly high number of hash collisions in a worst case scenario.)

@mwillsey
Copy link
Member

I agree, I'd rather not increase the size of Symbol. If this is a continued perf bottleneck, we could consider switching to a purpose-built interning solution (many exist in Rust, but I'm not familiar with all of them).

@mwillsey
Copy link
Member

mwillsey commented Feb 1, 2022

Just leaving an idea here for now: You could "inline" < 3 byte strings into symbol pretty easily, which would require no synchronization for most variables (since most are very short).

Redefine Symbol as a i32 newtype. If the sign bit is set, then just read the bottom three bits as the string. I guess the string would be 0 terminated, or you could stash the length in the top byte as well.

EDIT: ah nevermind, this makes it impossible to implement Symbol::as_str(). You'd have to convert to a full String, which would be a more invasive change.

@mwillsey
Copy link
Member

mwillsey commented Feb 1, 2022

Ok nvm I figured it out. Could you try this branch instead on your workload? If your symbols are < 3 bytes, then they should be super fast! https://github.com/egraphs-good/egg/tree/tiny-symbols

@mmp
Copy link
Contributor Author

mmp commented Feb 15, 2022

Nice! I just got around to trying this with my workload; to my surprise it doesn't seem to give a performance benefit, FWIW.

@mwillsey
Copy link
Member

Thanks! Interesting that there is no speedup. I’ll try to merge something like this approach then. Are your variable names all long or something?

@mmp
Copy link
Contributor Author

mmp commented Feb 15, 2022

Ah, yes-for the example I was testing with, it happened that they were all >4 characters. Trying another one that has a mix of short ones and a few longer ones, I see a 10% reduction in overall wallclock time, so this is a definite win for that case, since it does a bunch of other stuff besides call into egg.

@mmp
Copy link
Contributor Author

mmp commented Feb 15, 2022

And now with actual profiling.
Before:
Screen Shot 2022-02-15 at 8 32 47 AM

After:
Screen Shot 2022-02-15 at 8 32 52 AM

So that's about 10x and pushes intern() down into the noise. (The above doesn't exactly line up with the 10% overall reduction reported above, but close enough.)

mwillsey added a commit that referenced this pull request Feb 15, 2022
@mwillsey
Copy link
Member

Great! The latest commit on tiny-symbols now implements both of these techniques, and it keeps the Symbol size to 4 bytes :). Barring any crazy results from you, I'll merge that and close this PR.

@mwillsey
Copy link
Member

Hi Matt! I've taken this idea and combined it with something inspired by this post and made the symbol_table crate.

Commit ed1faf1 (and the upcoming 0.8 release) will use this for string interning, which should solve your perf issues once and for all!

@mwillsey mwillsey closed this Apr 28, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants