-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add missing mutex when reading from shared variable bg_bottom_compaction_scheduled_, bg_compaction_scheduled_ #10610
Conversation
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
db/compaction/compaction_job.cc
Outdated
@@ -321,6 +321,7 @@ void CompactionJob::AcquireSubcompactionResources( | |||
->write_controller() | |||
->NeedSpeedupCompaction()) | |||
.max_compactions; | |||
db_mutex_->Lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use InstrumentedMutexLock
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
db/compaction/compaction_job.cc
Outdated
@@ -380,6 +380,7 @@ void CompactionJob::ReleaseSubcompactionResources() { | |||
if (extra_num_subcompaction_threads_reserved_ == 0) { | |||
return; | |||
} | |||
db_mutex_->Lock(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately I can't easily do InstrumentedMutexLock for this one cuz the function right after its unlock "ShrinkSubcompactionResources(extra_num_subcompaction_threads_reserved_);" but before the scope ends need to acquire lock again. So I won't be fixing this one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about
{
InstrumentedMutexLock (db_mutex_);
assert(...)
}
ShrinkSubcompactionresrouce();
Furthermore, mutex lock-unlock is not needed if in opt mode, but if we differentiate between debug and opt mode, there may be a gap in test coverage, thus I am fine with current.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got you - yeah I can change to your suggested version! This is the easy way that I didn't think of!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@hx235 has updated the pull request. You must reimport the pull request before landing. |
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM as long as it fixes the data race.
Not related to this PR, but I find the design of having one class keeping a pointer to a non-public data member difficult to track when trying to reason about data access safety. We should think of a better way of doing it.
@hx235 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
Thanks. Yeah this was one of those difficult design encountered during the internship that was decided to pursue the current way. Need to think harder about your suggestion ... |
Tracked this internally as a tech debt. |
Context/Summary:
According to https://github.com/facebook/rocksdb/blob/7.6.fb/db/compaction/compaction_job.h#L328-L332, any reading in the form of
*bg_compaction_scheduled_
,*bg_bottom_compaction_scheduled_
should be protected by mutex, which isn't the case for some assert statement. This leads to a data race that can be repro-ed by the following command (command coming soon)This PR added all the missing mutex that should've been in place
Test: