Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kinesis source must report its latency when KCL is not healthy #77

Merged
merged 1 commit into from
May 29, 2024

Conversation

istreeter
Copy link
Contributor

@istreeter istreeter commented May 16, 2024

Part of PDP-1196

The KCL is sadly not very good at crashing and exiting. If the underlying Kinesis client has errors (e.g. permissions errors) then KCL tends to stay alive and not propagate the exceptions to our application code. We want the app to crash under these circumstances because that triggers an alert.

common-streams already has a health check feature, in which a health probe becomes unhealthy if a single event gets stuck without making progress.

This PR leans on the existing health check feature, so it also becomes unhealthy if the Kinesis client is not regularly receiving healthy responses.

I configured KCL to invoke our record processor every time it polls for records, even if the batch is empty. This means the health check still works even if there are no events in the stream.

@istreeter istreeter force-pushed the crash-on-unhealthy-kcl branch from 2220fd8 to dd179d6 Compare May 16, 2024 23:20
@istreeter istreeter changed the title Kinesis source must exit when KCL is not healthy Kinesis source must report its latency when KCL is not healthy May 16, 2024
@istreeter istreeter force-pushed the crash-on-unhealthy-kcl branch 2 times, most recently from d5fdd15 to e4b0198 Compare May 16, 2024 23:40
@spenes spenes force-pushed the crash-on-unhealthy-kcl branch from e4b0198 to 53eb2b5 Compare May 27, 2024 11:20
@spenes spenes force-pushed the crash-on-unhealthy-kcl branch from 470c1cd to cd34931 Compare May 29, 2024 07:24
The KCL is sadly not very good at crashing and exiting. If the
underlying Kinesis client has errors (e.g. permissions errors) then KCL
tends to stay alive and not propagate the exceptions to our application
code. We want the app to crash under these circumstances because that
triggers an alert.

common-streams already has a health check feature, in which a health
probe becomes unhealthy if a single event gets stuck without making
progress.

This PR leans on the existing health check feature, so it also becomes
unhealthy if the Kinesis client is not regularly receiving healthy
responses.

I configured KCL to invoke our record processor every time it polls for
records, even if the batch is empty. This means the health check still
works even if there are no events in the stream.
@spenes spenes force-pushed the crash-on-unhealthy-kcl branch from cd34931 to be1fdb8 Compare May 29, 2024 07:44
@spenes spenes merged commit be1fdb8 into develop May 29, 2024
7 checks passed
@istreeter istreeter deleted the crash-on-unhealthy-kcl branch June 8, 2024 20:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants