Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(bigquery/storage/managedwriter): more advanced routing #7587

Merged
merged 32 commits into from
Mar 30, 2023

Conversation

shollyman
Copy link
Contributor

@shollyman shollyman commented Mar 18, 2023

This PR introduces a more advanced sharedRouter suitable for handling both exclusive and multiplex stream traffic. Overall this improves memory slightly (we don't have a pool-per-exclusive stream). We switch to using the new router for all real traffic, but some unit testing still leverages the simple router where we're not exercising routing. This change means a client will have at most 1 connection pool per region, rather than 1 per exclusive connection using the older router.

The router maintains independent datastructures (and mutexes) for exclusive and multiplex connections. exclusive mappings are easy, they're always 1:1. For multiplex, we potentially adjust multiplex connection pool sizes when writers attach and detach, and we maintain mappings in both directions between connections and writers. There's also a watchdog goroutine that periodically curates the multiplex routing, looking for opportunities to grow and rebalance traffic, rather than trying to process this when trying to route an append to a connection.

Benchmarking exercises routing for both explicit stream and default stream writers, both using serial and concurrent strategies. We also benchmark watchdog timing to understand how expensive adjusting the pool may be, though it's inherently unstable so we force benchmarking to run with a fixed -benchtime.

Benchmark results here

Towards: #7103

This PR allows the flowcontroller to report bytes in flight for flow
controllers with a bounded byte definition.  The primary connection
load signals for a connection are the inserts/bytes in flight as
reported by the flow controller, and this makes the bytes in flight a
signal we can use.

Important note: an unbounded flow controller will not report any bytes
in flight.  This avoids introducing odd situations due to size
normalization where bytes tracked and the actual capacity of the
semaphore could get out of sync.

Towards: googleapis#7103
@product-auto-label product-auto-label bot added size: l Pull request size is large. api: bigquery Issues related to the BigQuery API. labels Mar 18, 2023
@product-auto-label product-auto-label bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Mar 22, 2023
@shollyman shollyman marked this pull request as ready for review March 22, 2023 20:29
@shollyman shollyman requested review from a team as code owners March 22, 2023 20:29
@shollyman shollyman requested review from tswast, alvarowolfx and yirutang and removed request for tswast March 22, 2023 20:29
@yirutang yirutang requested a review from GaoleMeng March 22, 2023 21:32
bigquery/storage/managedwriter/connection.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/client.go Show resolved Hide resolved
bigquery/storage/managedwriter/routers_test.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers_test.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers_test.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers_test.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/options.go Show resolved Hide resolved
Copy link

@GaoleMeng GaoleMeng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will continue the review tomorrow

bigquery/storage/managedwriter/connection.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Outdated Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Show resolved Hide resolved
bigquery/storage/managedwriter/routers.go Outdated Show resolved Hide resolved
// only look for rebalance opportunies between different connections.
for mostIdleIdx != leastIdleIdx {
targetConn := sr.multiConns[leastIdleIdx]
if targetConn.curLoad() < mostIdleLoad*connLoadDeltaThreshold {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't the targetConn's curLoad easy to be changed while doing rebalancing since the messages are being digested? Thus we may not currently checking the busiest connection in the ordered list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the entire system is dynamic. We're essentially taking a snapshot of the connections on an interval and potentially making slight adjustments when there's a sufficient difference in load to warrant such things.

The goal is to get an initial version we can profile and get feedback on, we're not married to a specific implementation.

@shollyman
Copy link
Contributor Author

In the course of benchmarking the original impl, observed undesired growth in watchdog time so I did end up introducing an inverse map from connection back to writers.

@shollyman shollyman merged commit c24f0a1 into googleapis:main Mar 30, 2023
@shollyman shollyman deleted the loadreporting branch March 30, 2023 21:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: bigquery Issues related to the BigQuery API. size: xl Pull request size is extra large.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants