Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Re-implement Kinesis source without fs2-kinesis #84

Merged
merged 2 commits into from
Sep 13, 2024

Conversation

istreeter
Copy link
Contributor

The common-streams Kinesis source suffers from a problem where we don't quite achieve at-least-once processing semantics near the end of a shard. The problem was in the 3rd-party fs-kinesis library, and it is not easy to fix with any small code change to that library.

Sorry I cannot provide a link here to back that up -- it is documented internally at Snowplow.

This PR re-implements our Kinesis source from scratch, this time without a dependency on fs2-kinesis. The biggest difference is the way we block the shardEnded method of the KCL record processor, until all records from the shard have been written to the destination.

@istreeter istreeter force-pushed the re-implement-kinesis-source branch 3 times, most recently from 9154b10 to 931e292 Compare August 27, 2024 09:49
@istreeter istreeter force-pushed the re-implement-kinesis-source branch 2 times, most recently from e983e5a to 3838ac0 Compare September 9, 2024 22:57
The common-streams Kinesis source suffers from a problem where we don't
quite achieve at-least-once processing semantics near the end of a
shard. The problem was in the 3rd-party fs-kinesis library, and it is
not easy to fix with any small code change to that library.

Sorry I cannot provide a link here to back that up -- it is documented
internally at Snowplow.

This PR re-implements our Kinesis source from scratch, this time without
a dependency on fs2-kinesis.  The biggest difference is the way we block
the `shardEnded` method of the KCL record processor, until all records
from the shard have been written to the destination.
@istreeter istreeter force-pushed the re-implement-kinesis-source branch from 3838ac0 to a6b96d0 Compare September 9, 2024 23:07
@istreeter istreeter merged commit ae0e57e into develop Sep 13, 2024
1 check passed
@istreeter istreeter deleted the re-implement-kinesis-source branch September 13, 2024 08:48
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Sep 20, 2024
The following improvements are introduced via common-streams 0.8.0-M4:

- Fields starting with a digit are now prefixed with an underscore `_`.
  This is needed for Hudi, which does not allow fields starting with a
  digit (snowplow/schema-ddl#209)
- New kinesis source implementation without fs2-kinesis
  (snowplow-incubator/common-streams#84)
- Iglu schemas are resolved in parallel, for short pause times during
  event processing (snowplow-incubator/common-streams#85)
- Common http client configured with restricted max connections per
  server (snowplow-incubator/common-streams#87)
- Iglu scala client 3.2.0 no longer relies on the "list" schemas
  endpoint (snowplow/iglu-scala-client#255)
istreeter added a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Sep 20, 2024
The following improvements are introduced via common-streams 0.8.0-M4:

- Fields starting with a digit are now prefixed with an underscore `_`.
  This is needed for Hudi, which does not allow fields starting with a
  digit (snowplow/schema-ddl#209)
- New kinesis source implementation without fs2-kinesis
  (snowplow-incubator/common-streams#84)
- Iglu schemas are resolved in parallel, for short pause times during
  event processing (snowplow-incubator/common-streams#85)
- Common http client configured with restricted max connections per
  server (snowplow-incubator/common-streams#87)
- Iglu scala client 3.2.0 no longer relies on the "list" schemas
  endpoint (snowplow/iglu-scala-client#255)
oguzhanunlu pushed a commit to snowplow-incubator/snowplow-lake-loader that referenced this pull request Nov 1, 2024
The following improvements are introduced via common-streams 0.8.0-M4:

- Fields starting with a digit are now prefixed with an underscore `_`.
  This is needed for Hudi, which does not allow fields starting with a
  digit (snowplow/schema-ddl#209)
- New kinesis source implementation without fs2-kinesis
  (snowplow-incubator/common-streams#84)
- Iglu schemas are resolved in parallel, for short pause times during
  event processing (snowplow-incubator/common-streams#85)
- Common http client configured with restricted max connections per
  server (snowplow-incubator/common-streams#87)
- Iglu scala client 3.2.0 no longer relies on the "list" schemas
  endpoint (snowplow/iglu-scala-client#255)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants