-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat (dynamodb): Configurable parallelism in initial offset store query #1239
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 👍🏼
# Number of slices to read offsets simultaneously for. The underlying Dynamo | ||
# client must be able to handle (`http.max-concurrency` plus `http.max-pending-connection-acquires`) | ||
# at least this number of concurrent requests. Defaults to 1024 (all slices simultaneously), | ||
# but may be reduced. | ||
offset-slice-read-parallelism = 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we default to something lower? Maybe 32 or 64? Rather than leave it to be configured if there's only one or small number of projection instances.
And maybe describe in the comment somewhere that it's the slice range for a projection instance that needs to be retrieved together (it's only all slices if it's a single projection instance).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's have less as default. Since it's asyncUnordered, 32 should be enough. I assume such default would work well with the defaults of the client config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1024 would work with the defaults (which are 50 connections and 10k pending which would likely allow for 1024 offset queries)... so the more likely scenario to need this involves tuning the client to a small connection pool and/or pending queue.
1024 matches the earlier default "all-at-once" behavior, but this is new enough that maybe the defaults can be expected to change from version to version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I see no problem to change this behavior in next patch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further consideration (from spelling things out more in the comment)... the adverse impact of being smaller (e.g. 64 vs. 1024 means going from ~10ms querying for offsets to ~200ms in the 1024-slice case (~100ms in the 512 slice case and ~50ms in the 256 slice case to no impact in the 64 slice case)) is far less than the impact of a projection being restarted with backoff (default is 3 seconds) due to being unable to query.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good
...ojection-dynamodb/src/main/scala/akka/projection/dynamodb/internal/DynamoDBOffsetStore.scala
Outdated
Show resolved
Hide resolved
# Number of slices to read offsets simultaneously for. The underlying Dynamo | ||
# client must be able to handle (`http.max-concurrency` plus `http.max-pending-connection-acquires`) | ||
# at least this number of concurrent requests. Defaults to 1024 (all slices simultaneously), | ||
# but may be reduced. | ||
offset-slice-read-parallelism = 1024 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, let's have less as default. Since it's asyncUnordered, 32 should be enough. I assume such default would work well with the defaults of the client config?
References #1222 (comment)
The offset retrieval query performs a query for each slice in its range (up to 1024 in the case of a single-instance projection, which might be used if the projection is for an entity with an expected-to-be-low rate of events); these queries are performed simultaneously and all in a given invocation must succeed. The default number of HTTP connections to dynamo is likely to be less than the number of queries, so these queries will be queued (by dynamo) with other persistence operations (in the case where projections are started early in the application's lifecycle, there may likewise be a surge of persistence operations as shards are rebalanced to the application instance), which may have undesirable effects on write-side latency, or in extreme cases (if
max-pending-connection-acquires
is limited) prevent the projection from running.Using separate Dynamo clients for write-side and projections isolates them, though at the cost of extra HTTP connections in the respective pools as well as threads to manage said connections; this is especially exacerbated in the case of entities with low rates of event traffic.
This change allows the parallelism of the offset query to be bounded.