feat (dynamodb): Configurable parallelism in initial offset store query #1239

leviramsey · 2024-10-27T20:54:34Z

The offset retrieval query performs a query for each slice in its range (up to 1024 in the case of a single-instance projection, which might be used if the projection is for an entity with an expected-to-be-low rate of events); these queries are performed simultaneously and all in a given invocation must succeed. The default number of HTTP connections to dynamo is likely to be less than the number of queries, so these queries will be queued (by dynamo) with other persistence operations (in the case where projections are started early in the application's lifecycle, there may likewise be a surge of persistence operations as shards are rebalanced to the application instance), which may have undesirable effects on write-side latency, or in extreme cases (if max-pending-connection-acquires is limited) prevent the projection from running.

Using separate Dynamo clients for write-side and projections isolates them, though at the cost of extra HTTP connections in the respective pools as well as threads to manage said connections; this is especially exacerbated in the case of entities with low rates of event traffic.

This change allows the parallelism of the offset query to be bounded.

pvlugter

Looks good 👍🏼

pvlugter · 2024-10-30T06:40:54Z

akka-projection-dynamodb/src/main/resources/reference.conf

+	# Number of slices to read offsets simultaneously for.  The underlying Dynamo
+	# client must be able to handle (`http.max-concurrency` plus `http.max-pending-connection-acquires`)
+	# at least this number of concurrent requests.  Defaults to 1024 (all slices simultaneously),
+	# but may be reduced.
+	offset-slice-read-parallelism = 1024


Shall we default to something lower? Maybe 32 or 64? Rather than leave it to be configured if there's only one or small number of projection instances.

And maybe describe in the comment somewhere that it's the slice range for a projection instance that needs to be retrieved together (it's only all slices if it's a single projection instance).

I agree, let's have less as default. Since it's asyncUnordered, 32 should be enough. I assume such default would work well with the defaults of the client config?

1024 would work with the defaults (which are 50 connections and 10k pending which would likely allow for 1024 offset queries)... so the more likely scenario to need this involves tuning the client to a small connection pool and/or pending queue.

1024 matches the earlier default "all-at-once" behavior, but this is new enough that maybe the defaults can be expected to change from version to version?

yes, I see no problem to change this behavior in next patch

Further consideration (from spelling things out more in the comment)... the adverse impact of being smaller (e.g. 64 vs. 1024 means going from ~10ms querying for offsets to ~200ms in the 1024-slice case (~100ms in the 512 slice case and ~50ms in the 256 slice case to no impact in the 64 slice case)) is far less than the impact of a projection being restarted with backoff (default is 3 seconds) due to being unable to query.

patriknw

looking good

...ojection-dynamodb/src/main/scala/akka/projection/dynamodb/internal/DynamoDBOffsetStore.scala

patriknw · 2024-10-30T07:44:19Z

akka-projection-dynamodb/src/main/resources/reference.conf

+	# Number of slices to read offsets simultaneously for.  The underlying Dynamo
+	# client must be able to handle (`http.max-concurrency` plus `http.max-pending-connection-acquires`)
+	# at least this number of concurrent requests.  Defaults to 1024 (all slices simultaneously),
+	# but may be reduced.
+	offset-slice-read-parallelism = 1024


I agree, let's have less as default. Since it's asyncUnordered, 32 should be enough. I assume such default would work well with the defaults of the client config?

akka-projection-dynamodb/src/main/resources/reference.conf

Configurable parallelism in initial offset store query

62d10cc

pvlugter reviewed Oct 30, 2024

View reviewed changes

patriknw reviewed Oct 30, 2024

View reviewed changes

leviramsey added 3 commits October 30, 2024 11:10

More idiomatic materializer

2f736c2

improve config docs; reduce default parallelism to 64

da6753f

style

5981443

pvlugter reviewed Oct 30, 2024

View reviewed changes

akka-projection-dynamodb/src/main/resources/reference.conf Outdated Show resolved Hide resolved

spaces

18441ea

pvlugter approved these changes Oct 31, 2024

View reviewed changes

pvlugter merged commit 5440cb8 into akka:main Nov 4, 2024
21 of 22 checks passed

leviramsey deleted the dynamodb-offset-parallelism branch November 5, 2024 11:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat (dynamodb): Configurable parallelism in initial offset store query #1239

feat (dynamodb): Configurable parallelism in initial offset store query #1239

leviramsey commented Oct 27, 2024

pvlugter left a comment

pvlugter Oct 30, 2024 •

edited

Loading

patriknw Oct 30, 2024

leviramsey Oct 30, 2024

patriknw Oct 30, 2024

leviramsey Oct 30, 2024 •

edited

Loading

patriknw left a comment

patriknw Oct 30, 2024

feat (dynamodb): Configurable parallelism in initial offset store query #1239

feat (dynamodb): Configurable parallelism in initial offset store query #1239

Conversation

leviramsey commented Oct 27, 2024

pvlugter left a comment

Choose a reason for hiding this comment

pvlugter Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

patriknw Oct 30, 2024

Choose a reason for hiding this comment

leviramsey Oct 30, 2024

Choose a reason for hiding this comment

patriknw Oct 30, 2024

Choose a reason for hiding this comment

leviramsey Oct 30, 2024 • edited Loading

Choose a reason for hiding this comment

patriknw left a comment

Choose a reason for hiding this comment

patriknw Oct 30, 2024

Choose a reason for hiding this comment

pvlugter Oct 30, 2024 •

edited

Loading

leviramsey Oct 30, 2024 •

edited

Loading