-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
workload/ycsb: give each worker a separate sql.Conn #86841
workload/ycsb: give each worker a separate sql.Conn #86841
Conversation
This commit gives ycsb the same treatment we gave kv in cockroachdb#30811 (4 years ago!). It removes a throughput bottleneck in the workload client by giving each ycsb worker goroutine its own `sql.Conn`, instead of having them all pull from a sufficiently sized `sql.DB` pool. Doing so avoids mutex contention. This commit also goes a step further than cockroachdb#30811 by creating multiple `sql.DB` objects and pulling only a bounded number of connections from each. This further avoids mutex contention and removes the next bottleneck. Without these changes, a ycsb driver on a 64 vCPU machine could push about 200k qps before hitting a ceiling. With it, I've seen the driver push upt to about 500k qps before CRDB itself became the bottlenck. Release justification: workload only.
18de6b3
to
566bb16
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Do you know if this could explain why YCSB/E has remained essentially flat, even when other benchmarks have shown dramatic regressions? We hypothesized that it was due to network bandwidth saturation, but it seems like this might be a contributing factor as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think it's worth re-running any benchmarks from #85993 with this patch?
Reviewable status: complete! 0 of 0 LGTMs obtained
TFTRs! bors r+
We only started to see a client-side bottleneck due to mutex contention at about 250k qps, so I don't think this would explain the YCSB-E results unfortunately.
It's possible that this had an impact on YCSB-C and YCSB-D, because those are starting to breach the 100k qps mark. Still, I don't think it's worth spending the time to re-run given that we could push up to 250k qps before being entirely bottlenecked on the client. |
Build succeeded: |
This commit gives ycsb the same treatment we gave kv in #30811 (4 years
ago!). It removes a throughput bottleneck in the workload client by
giving each ycsb worker goroutine its own
sql.Conn
, instead of havingthem all pull from a sufficiently sized
sql.DB
pool. Doing so avoidsmutex contention.
This commit also goes a step further than #30811 by creating multiple
sql.DB
objects and pulling only a bounded number of connections fromeach. This further avoids mutex contention and removes the next
bottleneck.
Without these changes, a ycsb driver on a 64 vCPU machine could push
about 200k qps before hitting a ceiling. With it, I've seen the driver
push upt to about 500k qps before CRDB itself became the bottlenck.
Release justification: workload only.