-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kinesis source improve management of shard ends #102
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
In the Kinesis Source, we terminate the inner stream of events whenever we reach the end of a Kinesis shard. Terminating the inner stream is important, because it forces the application to fully process and checkpoint any outstanding events, and this unblocks our KCL record processor from checkpointing the end of the shard. Before this PR, we terminated the inner stream for _every_ shard end. But terminating the stream is quite inefficient, and during a re-sharding we probably reach many shard ends at similar time. This PR changes things so we try to handle many shard ends at the same time. During re-sharding, this should reduce the number of times we need to terminate the inner stream.
istreeter
force-pushed
the
kinesis-shard-end-improvements
branch
from
December 20, 2024 12:37
fe5bbf2
to
1231e8a
Compare
istreeter
added a commit
to snowplow-incubator/snowplow-lake-loader
that referenced
this pull request
Dec 31, 2024
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99
pondzix
reviewed
Jan 2, 2025
...es/kinesis/src/main/scala/com/snowplowanalytics/snowplow/sources/kinesis/KinesisSource.scala
Show resolved
Hide resolved
pondzix
approved these changes
Jan 3, 2025
istreeter
added a commit
to snowplow-incubator/snowplow-lake-loader
that referenced
this pull request
Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-lake-loader
that referenced
this pull request
Jan 10, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
istreeter
added a commit
to snowplow-incubator/snowplow-bigquery-loader
that referenced
this pull request
Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and Pubsub sources: - PubSub source completely re-written to be a wrapper around UnaryPull snowplow-incubator/common-streams#101 - Kinesis source is more efficient when the stream is re-sharded snowplow-incubator/common-streams#102 - Kinesis source better tuned for larger deployments snowplow-incubator/common-streams#99 And improvements to latency metrics: - Sources should report stream latency of stuck events snowplow-incubator/common-streams#104
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In the Kinesis Source, we terminate the inner stream of events whenever we reach the end of a Kinesis shard. Terminating the inner stream is important, because it forces the application to fully process and checkpoint any outstanding events, and this unblocks our KCL record processor from checkpointing the end of the shard.
Before this PR, we terminated the inner stream for every shard end. But terminating the stream is quite inefficient, and during a re-sharding we probably reach many shard ends at similar time. This PR changes things so we try to handle many shard ends at the same time. During re-sharding, this should reduce the number of times we need to terminate the inner stream.