-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add option to exit on missing Iglu schemas #79
Merged
Merged
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Before this PR, the loader would generate a failed event if it failed to fetch a required schema from Iglu. However, all events have already passed validation in Enrich, so it is completely unexpected to have an Iglu failure. An Iglu error _probably_ means some type of configuration error or service outage. After this PR, the loader will crash and exit on an Iglu error, instead of creating a failed event. This is probably the preferred behaviour, while the pipeline operator addresses the underlying infrastructure problem. If an Iglu schema is genuinely now unavailable, then the pipeline operator can override the default behaviour by setting `exitOnMissingFailure: false` in the configuration file or by listing the missing schema in `skipschemas`.
pondzix
approved these changes
Sep 4, 2024
# -- We recommend `true` because Snowplow enriched events have already passed validation, so a missing schema normally | ||
# -- indicates an error that needs addressing. | ||
# -- Change to `false` so events go the failed events stream instead of crashing the loader. | ||
"exitOnMissingIgluSchema": true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this doesn't match what is in commit message?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll fix the commit message as I merge it. Thanks.
zhaow-de
added a commit
to alloy-ch/rcplus-alloy-snowplow-lake-loader
that referenced
this pull request
Oct 4, 2024
…patch-for-alloy * commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1': Upgrade common-streams to 0.8.0-M5 Delete files asynchronously (snowplow-incubator#82) Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81) Avoid error on duplicate view name (snowplow-incubator#80) Add option to exit on missing Iglu schemas (snowplow-incubator#79) common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78) Create table concurrently with subscribing to stream of events (snowplow-incubator#77) Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76) Make alert messages more human-readable (snowplow-incubator#75) Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72) Add alert & retry for delta/s3 initialization (snowplow-incubator#74) Implement alerting and retrying mechanisms Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71) Bump hudi to 0.15.0 (snowplow-incubator#70) Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66) Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69) Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu
pushed a commit
that referenced
this pull request
Nov 1, 2024
Before this PR, the loader would generate a failed event if it failed to fetch a required schema from Iglu. However, all events have already passed validation in Enrich, so it is completely unexpected to have an Iglu failure. An Iglu error _probably_ means some type of configuration error or service outage. After this PR, the loader will crash and exit on an Iglu error, instead of creating a failed event. This is probably the preferred behaviour, while the pipeline operator addresses the underlying infrastructure problem. If an Iglu schema is genuinely now unavailable, then the pipeline operator can override the default behaviour by setting `exitOnMissingIgluSchema: false` in the configuration file or by listing the missing schema in `skipschemas`.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Before this PR, the loader would generate a failed event if it failed to fetch a required schema from Iglu. However, all events have already passed validation in Enrich, so it is completely unexpected to have an Iglu failure. An Iglu error probably means some type of configuration error or service outage.
After this PR, the loader will crash and exit on an Iglu error, instead of creating a failed event. This is probably the preferred behaviour, while the pipeline operator addresses the underlying infrastructure problem.
If an Iglu schema is genuinely now unavailable, then the pipeline operator can override the default behaviour by setting
exitOnMissingIgluSchema: false
in the configuration file or by listing the missing schema inskipschemas
.