Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(processor): support multiple jobsdb writers when source isolation is enabled #3428

Merged
merged 1 commit into from
Jun 6, 2023

Conversation

atzoum
Copy link
Contributor

@atzoum atzoum commented May 31, 2023

Description

Since multiple goroutines might be trying to store jobs in router's jobsdb for the same destinationID and to avoid job ordering guarantee panics (router encountering a job with id less than a job it has previously encountered), we are introducing a partition locker to serialise write access.
Now, before writing a batch of jobs in router, the goroutine in processor will acquire all required partition locks (using an ascending lock order for avoiding deadlocks) and release them after store is completed.

Notion Ticket

Link

Security

  • The code changed/added as part of this pull request won't create any security issues with how the software is being used.

@Sidddddarth
Copy link
Member

wouldn't adding the sourceID to the barrier's orderKey have sufficed?

@codecov
Copy link

codecov bot commented May 31, 2023

Codecov Report

Patch coverage: 82.60% and project coverage change: -0.39 ⚠️

Comparison is base (8d35882) 68.86% compared to head (7c4dae7) 68.48%.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3428      +/-   ##
==========================================
- Coverage   68.86%   68.48%   -0.39%     
==========================================
  Files         330      330              
  Lines       52973    52991      +18     
==========================================
- Hits        36478    36289     -189     
- Misses      14140    14337     +197     
- Partials     2355     2365      +10     
Impacted Files Coverage Δ
processor/processor.go 87.74% <82.60%> (+0.10%) ⬆️

... and 14 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@atzoum
Copy link
Contributor Author

atzoum commented May 31, 2023

wouldn't adding the sourceID to the barrier's orderKey have sufficed?

Thought about it here as well, but I believe that still there could be a slight possibility for things to go wrong with job iterator, which is using AfterJobID filter for fetching more pages.

Comment on lines +1906 to +1916
// Only one goroutine can store to a router destination at a time, otherwise we may have different transactions
// committing at different timestamps which can cause events with lower jobIDs to appear after events with higher ones.
// For that purpose, before storing, we lock the relevant destination IDs (in sorted order to avoid deadlocks).
if len(in.routerDestIDs) > 0 {
destIDs := lo.Uniq(in.routerDestIDs)
slices.Sort(destIDs)
for _, destID := range destIDs {
proc.storePlocker.Lock(destID)
defer proc.storePlocker.Unlock(destID)
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler if we had just one lock?

I haven't any benchmarks, but concurrent writes on the same table are not beneficial from a performance standpoint.

Copy link
Contributor Author

@atzoum atzoum Jun 2, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have observed major improvements in wait times after increasing the number of writers, for some scenarios

@atzoum atzoum force-pushed the chore.multipleWorkers branch 3 times, most recently from bd1bf72 to d360246 Compare June 6, 2023 08:36
@atzoum atzoum merged commit b25003d into master Jun 6, 2023
@atzoum atzoum deleted the chore.multipleWorkers branch June 6, 2023 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants