Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

workload/ycsb: give each worker a separate sql.Conn #86841

Merged
merged 1 commit into from
Aug 29, 2022

Conversation

nvanbenschoten
Copy link
Member

This commit gives ycsb the same treatment we gave kv in #30811 (4 years
ago!). It removes a throughput bottleneck in the workload client by
giving each ycsb worker goroutine its own sql.Conn, instead of having
them all pull from a sufficiently sized sql.DB pool. Doing so avoids
mutex contention.

This commit also goes a step further than #30811 by creating multiple
sql.DB objects and pulling only a bounded number of connections from
each. This further avoids mutex contention and removes the next
bottleneck.

Without these changes, a ycsb driver on a 64 vCPU machine could push
about 200k qps before hitting a ceiling. With it, I've seen the driver
push upt to about 500k qps before CRDB itself became the bottlenck.

Release justification: workload only.

@nvanbenschoten nvanbenschoten requested review from aayushshah15 and a team August 25, 2022 02:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

This commit gives ycsb the same treatment we gave kv in cockroachdb#30811 (4 years
ago!). It removes a throughput bottleneck in the workload client by
giving each ycsb worker goroutine its own `sql.Conn`, instead of having
them all pull from a sufficiently sized `sql.DB` pool. Doing so avoids
mutex contention.

This commit also goes a step further than cockroachdb#30811 by creating multiple
`sql.DB` objects and pulling only a bounded number of connections from
each. This further avoids mutex contention and removes the next
bottleneck.

Without these changes, a ycsb driver on a 64 vCPU machine could push
about 200k qps before hitting a ceiling. With it, I've seen the driver
push upt to about 500k qps before CRDB itself became the bottlenck.

Release justification: workload only.
@nvanbenschoten nvanbenschoten force-pushed the nvanbenschoten/ycsbSqlConn branch from 18de6b3 to 566bb16 Compare August 26, 2022 19:35
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

Do you know if this could explain why YCSB/E has remained essentially flat, even when other benchmarks have shown dramatic regressions? We hypothesized that it was due to network bandwidth saturation, but it seems like this might be a contributing factor as well.

Copy link
Contributor

@aayushshah15 aayushshah15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think it's worth re-running any benchmarks from #85993 with this patch?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained

@nvanbenschoten
Copy link
Member Author

TFTRs!

bors r+

Do you know if this could explain why YCSB/E has remained essentially flat, even when other benchmarks have shown dramatic regressions?

We only started to see a client-side bottleneck due to mutex contention at about 250k qps, so I don't think this would explain the YCSB-E results unfortunately.

Do you think it's worth re-running any benchmarks from #85993 with this patch?

It's possible that this had an impact on YCSB-C and YCSB-D, because those are starting to breach the 100k qps mark. Still, I don't think it's worth spending the time to re-run given that we could push up to 250k qps before being entirely bottlenecked on the client.

@craig
Copy link
Contributor

craig bot commented Aug 29, 2022

Build succeeded:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants