Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release/0.7.0 #80

Merged
merged 5 commits into from
May 29, 2024
Merged

Release/0.7.0 #80

merged 5 commits into from
May 29, 2024

Conversation

spenes
Copy link
Contributor

@spenes spenes commented May 29, 2024

No description provided.

istreeter and others added 5 commits May 15, 2024 12:30
common-streams has a feature in which a timed window of events is allowed to start processing
before the previous window has finalized. This is a great feature for making full use of available
cpu. It means are are always working the cpu hard, even if some slow I/O is required to finalize
the window.

Until now, the eagerness only stretched to consecutive windows. E.g. If window 1 is still
finalizing then it is allowed for window 2 to start processing; but it is not allowed for window 3
to start processing.

For Lake Loader I found it is better to let the eagerness stretch further.  E.g. window 3 is
allowed to start processing even if windows 1 and 2 are both still finalizing.  I also found it is
better to allow consecutive windows to be finalizing at the same time. E.g. window 2 can start its
finalization even if window 1 is still finishing its finalization.

This PR makes configurable how many windows may start eagerly ahead of a finalzing window.
common-streams has a feature where the first window is randomly adjusted
to a different size. This helps in a deployment where a large number of
pods might all start at the same time, but we pods to have mutually
staggered windows, e.g. to avoid write conflicts in the lake loader.

Unfortunately I broke the feature in #64 so this fixes it again.

I also add a test so it won't get broken again.
The PubSub Source parameter `parallelPullCount` was used to set the
parallelism of the underlying Subscriber. With a higher pull count, the
Subscriber can supply events more quickly to the downstream of the
application, but there is more overhead.

For typical Snowplow apps, a pull count of 1 is sufficient on small
instances. But when there is more cpu availalbe, the downstream app
processes events more quickly, and therefore we need a higher pull count
to provide the events.

This PR makes it so pull count is picked dynamically based on available
cpu. Snowplow apps on bigger instances will automatically get the
benefit of this change, without requiring the user to explicitly set the
pull count.
The KCL is sadly not very good at crashing and exiting. If the
underlying Kinesis client has errors (e.g. permissions errors) then KCL
tends to stay alive and not propagate the exceptions to our application
code. We want the app to crash under these circumstances because that
triggers an alert.

common-streams already has a health check feature, in which a health
probe becomes unhealthy if a single event gets stuck without making
progress.

This PR leans on the existing health check feature, so it also becomes
unhealthy if the Kinesis client is not regularly receiving healthy
responses.

I configured KCL to invoke our record processor every time it polls for
records, even if the batch is empty. This means the health check still
works even if there are no events in the stream.
@spenes spenes merged commit fecae1b into main May 29, 2024
7 checks passed
@istreeter istreeter deleted the release/0.7.0 branch June 8, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants