Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis source improve management of shard ends #102

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

istreeter
Copy link
Contributor

In the Kinesis Source, we terminate the inner stream of events whenever we reach the end of a Kinesis shard. Terminating the inner stream is important, because it forces the application to fully process and checkpoint any outstanding events, and this unblocks our KCL record processor from checkpointing the end of the shard.

Before this PR, we terminated the inner stream for every shard end. But terminating the stream is quite inefficient, and during a re-sharding we probably reach many shard ends at similar time. This PR changes things so we try to handle many shard ends at the same time. During re-sharding, this should reduce the number of times we need to terminate the inner stream.

In the Kinesis Source, we terminate the inner stream of events
whenever we reach the end of a Kinesis shard. Terminating the inner
stream is important, because it forces the application to fully process
and checkpoint any outstanding events, and this unblocks our KCL record
processor from checkpointing the end of the shard.

Before this PR, we terminated the inner stream for _every_ shard end. But
terminating the stream is quite inefficient, and during a re-sharding we
probably reach many shard ends at similar time. This PR changes things
so we try to handle many shard ends at the same time. During
re-sharding, this should reduce the number of times we need to terminate
the inner stream.
@istreeter istreeter force-pushed the kinesis-shard-end-improvements branch from fe5bbf2 to 1231e8a Compare December 20, 2024 12:37
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Dec 31, 2024
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99
@istreeter istreeter merged commit 82d3b8a into develop Jan 3, 2025
1 check passed
@istreeter istreeter deleted the kinesis-shard-end-improvements branch January 3, 2025 09:02
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 3, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Jan 10, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
istreeter added a commit to snowplow-incubator/snowplow-bigquery-loader that referenced this pull request Jan 14, 2025
Common streams 0.10.0 brings significant to changes to the Kinesis and
Pubsub sources:

- PubSub source completely re-written to be a wrapper around UnaryPull
  snowplow-incubator/common-streams#101
- Kinesis source is more efficient when the stream is re-sharded
  snowplow-incubator/common-streams#102
- Kinesis source better tuned for larger deployments
  snowplow-incubator/common-streams#99

And improvements to latency metrics:
- Sources should report stream latency of stuck events
  snowplow-incubator/common-streams#104
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants