Kinesis source must report its latency when KCL is not healthy #77
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Part of PDP-1196
The KCL is sadly not very good at crashing and exiting. If the underlying Kinesis client has errors (e.g. permissions errors) then KCL tends to stay alive and not propagate the exceptions to our application code. We want the app to crash under these circumstances because that triggers an alert.
common-streams already has a health check feature, in which a health probe becomes unhealthy if a single event gets stuck without making progress.
This PR leans on the existing health check feature, so it also becomes unhealthy if the Kinesis client is not regularly receiving healthy responses.
I configured KCL to invoke our record processor every time it polls for records, even if the batch is empty. This means the health check still works even if there are no events in the stream.