-
Notifications
You must be signed in to change notification settings - Fork 48
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move garbage collection to seize #102
Conversation
Codecov Report
|
This is so cool, thanks for taking it on! |
The "Disallowed attributes" CI job seems to be failing with a weird error, any chance you know why that is? |
#103 should take care of that 👍 |
There — a rebase should do it |
Can you approve the workflow run? |
Looks like you need to approve after every push... |
Hmm, I didn't think that should be necessary.. Let's try that again |
Everything is passing except Miri (need to disable isolation because seize calls num_cpus::get) and MSRV (which I can fix) 🎉 Looks like you need to approve again. I think you can disable the approval requirement here in the settings, although Github should recognize that you already approved my branch... |
Can you approve the workflow again, I believe I fixed the miri issue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great! I had a few comments, but I don't think any of them should be very hard to address 👍
} | ||
|
||
pub(crate) fn boxed(value: T, collector: &Collector) -> Self { | ||
Shared::from(collector.link_boxed(value)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that a ref to the Collector
is needed to construct a Shared
has me a little concerned — does every seize-protected value hold a pointer to the collector? If so, that'll pretty significantly increase the size of a map where the key and value types are small. Say they're u32
, but now we're sticking in a full usize
pointer twice in every entry as well!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The map holds pointer to the Linked<T>
, not the value directly, so it shouldn't increase the size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It still increases the overall heap memory use of the map significantly. Previously we'd store (for each item):
sizeof(usize) + sizeof(K) + sizeof(usize) + sizeof(V)
whereas if each Linked
also holds a pointer to the container, and every K
and V
is a Linked<_>
, then we're now storing
2 * sizeof(usize) + sizeof(K) + 2 * sizeof(usize) + sizeof(V)
which is a fairly significant increase in heap memory for a map when K
and V
are small, and there are many elements, such as maps to or from integers (which are very common in practice).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Linked
actually stores a lot more than that. Linked<()>
is 32 bytes, and packs a ton of metadata in there (2 unions). It stores a link to the head of the batch, the recorded epoch value when it was allocated, a link to the next node in the reservation list, the reference count and minimum recorded epoch of the batch, a link to the next node in the batch, and the reservation function 😅 All of this is pretty fundamental to the reclamation algorithm. Some of the data is inlined from what would instead be a thread-local list in crossbeam-epoch/hazard pointers.
How big of an issue do you think this is?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oooooof, that's pretty painful. For every value? That's definitely worrying. Consider a map with, say, 100k entries, all of which are usize
s. Its heap memory use would increase three-fold (from 100k*(228)=3.2MB to 100k*(2*(8+32+8))=9.6MB). Now, whether it's a deal-breaker, that's harder to say... In its current state, flurry isn't going to be used since its performance is so poor, and at least with this its performance is pretty good, even if its memory footprint is kind of lousy. That said, it would be in Java too, so maybe that's just what we signed up for :p
I think let's still go ahead with this, but maybe add a note about performance and memory characteristics to the docs, and highlight the expected memory use of the map given N
, sizeof(K)
, and sizeof(V)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm.. it might be possible to initially only store the epoch value, and then only when a node is retired, allocate the 32 bytes needed to keep track of it's reservation status. That would mean we incur an extra pointer indirection anytime reference counts need to be decremented.
Do you think that would be worthwhile?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does that mean an extra pointer indirection on every read? If so, I'd be curious to see the performance impact. But could very well be worth it, yeah!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No not on every read, just in retire and Guard::drop.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That definitely sounds promising then!
All 'static bounds have been removed. Now that the collector is owned by the map, all retired values will be reclaimed when the map is dropped. |
I think the only blocker now is documentation. I'll open an issue about memory usage optimizations. |
Alright, I updated the public documentation, but there are still 100+ safety comments that talk about "thread X must be pinned to a lesser epoch" 😅 Do you think it's important all these comments are updated? They are all correct of course, the terminology is just wrong (s/pinned to an epoch/marked as active). |
Unfortunately, yes, I do think it's important to update them, even if the terminology is just slightly off. Contributors will read the safety comments and try to reason from them as to whether their modifications are correct, and if the comment doesn't match the semantics of the protection, it will be very confusing. Hopefully it'll be mostly search/replace though! |
Updated all the safety comments, I think this can be merged now, provided I didn't make any typos :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly minor nits for the updated comments. I think we're almost good to go!
I think I've addressed all your comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for sticking with it!
Also, I would love to have you as a co-maintainer of this crate if you're up for it! You certainly know the code-base well enough at this point 😅 |
Sure! 😄 |
Just sent you a collaborator invite! |
Background
This PR moves garbage collection from
crossbeam-epoch
toseize
. As shown in #50, crossbeam-epoch has serious performance issues that make flurry much slower than other concurrent hashmaps. #95 attempted to switch toflize
, but unfortunately the pull request as well as the library have stalled.Design
seize
uses a special type of reference counting instead of tracking global epochs. When an object is retired, it is first appended to a thread-local batch. This operation does not allocate. When the batch size hits a configurable limit, the batch is added to a separate list for each thread that is currently active (holds a guard). The reference count of the batch is set to the number of threads that are active. After every thread drops its guard, it decrements the reference count. If it holds the last reference, it frees the entire batch.Implementation
The API provides by seize differs quite a bit from that of crossbeam-epoch. I tried to make migration easier by adding a
reclaim
module that mimics some of the previously used APIs, but here are some of the major changes:seize
requires metadata to be stored with every allocated object, soT -> Linked<T>
has propagated around the codebase.seize
has no concept of a global collector, which means that:pin
has been removed.with_collector
is now public as the footgun of usingepoch::pin()
no longer exists.Guard
has a lifetime. Public functions now take&'g Guard<'_>
instead of&'g Guard
.&Collector
in order to allocate objects.Performance
The algorithm used by
seize
has shown to have performance similar to or better than optimized epoch-based schemes (like flize). It is also much more efficient in terms of memory usage as reclamation is more targeted. Reclamation is naturally parallelized and balances across threads due to the nature of reference counting. Usingseize
significantly improves flurry's performance. Here's a read-heavy benchmark that was run on a 8 core machine, pinning before every operation:Remaining Items
&'g Guard<'g>
stored by iterators is problematic.K/V: 'static
bounds now that there is no global collector.check_guard
. This is difficult becauseseize::Collector
contains no allocations that can be easily accessed to check pointer equality. Perhaps flurry should expose it's own guard type that also contains an&HashMap
, or a dummy allocation?cc @jonhoo @domenicquirl
This change is