-
Notifications
You must be signed in to change notification settings - Fork 314
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(processor): support multiple jobsdb writers when source isolation is enabled #3428
Conversation
ea45b1d
to
21c504a
Compare
wouldn't adding the sourceID to the barrier's |
Codecov ReportPatch coverage:
Additional details and impacted files@@ Coverage Diff @@
## master #3428 +/- ##
==========================================
- Coverage 68.86% 68.48% -0.39%
==========================================
Files 330 330
Lines 52973 52991 +18
==========================================
- Hits 36478 36289 -189
- Misses 14140 14337 +197
- Partials 2355 2365 +10
☔ View full report in Codecov by Sentry. |
Thought about it here as well, but I believe that still there could be a slight possibility for things to go wrong with job iterator, which is using |
// Only one goroutine can store to a router destination at a time, otherwise we may have different transactions | ||
// committing at different timestamps which can cause events with lower jobIDs to appear after events with higher ones. | ||
// For that purpose, before storing, we lock the relevant destination IDs (in sorted order to avoid deadlocks). | ||
if len(in.routerDestIDs) > 0 { | ||
destIDs := lo.Uniq(in.routerDestIDs) | ||
slices.Sort(destIDs) | ||
for _, destID := range destIDs { | ||
proc.storePlocker.Lock(destID) | ||
defer proc.storePlocker.Unlock(destID) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be simpler if we had just one lock?
I haven't any benchmarks, but concurrent writes on the same table are not beneficial from a performance standpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have observed major improvements in wait times after increasing the number of writers, for some scenarios
bd1bf72
to
d360246
Compare
d360246
to
7c4dae7
Compare
Description
Since multiple goroutines might be trying to store jobs in router's jobsdb for the same destinationID and to avoid job ordering guarantee panics (router encountering a job with id less than a job it has previously encountered), we are introducing a partition locker to serialise write access.
Now, before writing a batch of jobs in router, the goroutine in processor will acquire all required partition locks (using an ascending lock order for avoiding deadlocks) and release them after store is completed.
Notion Ticket
Link
Security