-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of allocation profiling #287
Conversation
- Adjust the pseudo random generator to be thread local - Change the generator to be smaller - Remove the lock from the allocation profiling path
@@ -194,6 +190,10 @@ void AllocationTracker::track_allocation(uintptr_t addr, size_t size, | |||
free_on_consecutive_failures(success); | |||
|
|||
if (success && _state.track_deallocations) { | |||
// \fixme{r1viollet} adjust set to be lock free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could probably be replaced with a rather simple sparse bitset over the least significant 48 bits of the pointers, which would allow CAS operations on pages of the bitset rather than a lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this, especially as the impact of collisions are easy to manage. I'll probably make it smaller than that.
@@ -39,7 +44,7 @@ void perform_memory_operations(bool track_allocations, | |||
std::mt19937 gen(rd()); | |||
|
|||
for (auto _ : state) { | |||
state.PauseTiming(); | |||
// state.PauseTiming(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why does this need to be commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Initially I was measuring the cost of deallocation. In this PR I was optimizing the allocation code path.
I think the correct thing to do is have different benchmarks. I am continuing this work and I'll revisit this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The change makes sense to me
What does this PR do?
Credit for this idea goes to @nsavoire
Motivation
Improve performance of allocation profiling. This really brings down the allocation profiling to numbers that are comparable.
The numbers were 4x worse prior to this change.
After this change I will need to focus on the deallocation code path.
Additional Notes
This needs extensive testing.
Risks are:
How to test the change?
The benchmark only tests parts of this. We will need to be careful when deploying this change.