Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make alert messages more human-readable #75

Merged
merged 1 commit into from
Aug 5, 2024
Merged

Conversation

istreeter
Copy link
Collaborator

@istreeter istreeter commented Aug 5, 2024

The webhook alert should contain a short helpful message explaining why an error is caused by the destination setup. In other snowplow loaders we get the message simply by serializing the Exception. But in Lake Loader I found the exception messages to be very messy.

In a related problem, for Hudi setup errors I needed to traverse the Exception's getCause in order to check if it was a setup error.

This PR takes more explicit control of setting short friendly error messages, and traversing the getCause to get all relevant messages.

E.g. an alert message before this change:

Failed to create events table: s3a://<REDACTED/events/_delta_log: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by V1ToV2AwsCredentialProviderAdapter : software.amazon.awssdk.services.sts.model.StsException: User: arn:aws:iam:::user/ is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam:::role/ (Service: Sts, Status Code: 403, Request ID: 00000000-0000-0000-0000-000000000000)

The corresponding alert after this change:

Failed to create events table: s3a://<REDACTED/events/_delta_log: Failed to initialize AWS access credentials: Missing permissions to assume the AWS IAM role

Other small changes I snuck into this commit:

  • Added specific webhook alerts for Hudi.
  • Removed the AssumedRoleCredentialsProvider for aws sdk v1. This is no longer needed now that Hadoop is fully using aws sdk v2.
  • Fixed minor bug with retrying creating a database in Hudi Writer

The webhook alert should contain a short helpful message explaining why
an error is caused by the destination setup.  In other snowplow loaders
we get the message simply by serializing the Exception.  But in Lake
Loader I found the exception messages to be very messy.

In a related problem, for Hudi setup errors I needed to traverse the
Exception's `getCause` in order to check if it was a setup error.

This PR takes more explicit control of setting short friendly error
messages, and traversing the `getCause` to get all relevant messages.

E.g. an alert message before this change:

> Failed to create events table: s3a://<REDACTED/events/_delta_log: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by V1ToV2AwsCredentialProviderAdapter : software.amazon.awssdk.services.sts.model.StsException: User: arn:aws:iam::<REDACTED>:user/<REDACTED> is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<REDACTED>:role/<REDACTED> (Service: Sts, Status Code: 403, Request ID: 00000000-0000-0000-0000-000000000000)

The corresponding alert after this change:

> Failed to create events table: s3a://<REDACTED/events/_delta_log: Failed to initialize AWS access credentials: Missing permissions to assume the AWS IAM role

**Other small changes I snuck into this commit:**

- Added specific webhook alerts for Hudi.
- Removed the AssumedRoleCredentialsProvider for aws sdk v1.  This is no
  longer needed now that Hadoop is fully using aws sdk v2.
- Fixed minor bug with retrying creating a database in Hudi Writer
@istreeter istreeter force-pushed the cleaner-alert-message branch from b7f9565 to 9b57410 Compare August 5, 2024 09:45
Copy link
Contributor

@oguzhanunlu oguzhanunlu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 thanks @istreeter !

@istreeter istreeter merged commit 8feb154 into develop Aug 5, 2024
2 checks passed
@istreeter istreeter deleted the cleaner-alert-message branch August 5, 2024 10:59
zhaow-de added a commit to alloy-ch/rcplus-alloy-snowplow-lake-loader that referenced this pull request Oct 4, 2024
…patch-for-alloy

* commit '7ab2edc3fd4d81ffb4d5f3285d02330def7672b1':
  Upgrade common-streams to 0.8.0-M5
  Delete files asynchronously (snowplow-incubator#82)
  Upgrade common-streams 0.8.0-M4 (snowplow-incubator#81)
  Avoid error on duplicate view name (snowplow-incubator#80)
  Add option to exit on missing Iglu schemas (snowplow-incubator#79)
  common-streams 0.8.x with refactored health monitoring (snowplow-incubator#78)
  Create table concurrently with subscribing to stream of events (snowplow-incubator#77)
  Iceberg fail fast if missing permissions on the catalog (snowplow-incubator#76)
  Make alert messages more human-readable (snowplow-incubator#75)
  Hudi loader should fail early if missing permissions on Glue catalog (snowplow-incubator#72)
  Add alert & retry for delta/s3 initialization (snowplow-incubator#74)
  Implement alerting and retrying mechanisms
  Bump aws-hudi to 1.0.0-beta2 (snowplow-incubator#71)
  Bump hudi to 0.15.0 (snowplow-incubator#70)
  Allow disregarding Iglu field's nullability when creating output columns (snowplow-incubator#66)
  Extend health probe to report unhealthy on more error scenarios (snowplow-incubator#69)
  Fix bad rows resizing (snowplow-incubator#68)
oguzhanunlu pushed a commit that referenced this pull request Nov 1, 2024
The webhook alert should contain a short helpful message explaining why
an error is caused by the destination setup.  In other snowplow loaders
we get the message simply by serializing the Exception.  But in Lake
Loader I found the exception messages to be very messy.

In a related problem, for Hudi setup errors I needed to traverse the
Exception's `getCause` in order to check if it was a setup error.

This PR takes more explicit control of setting short friendly error
messages, and traversing the `getCause` to get all relevant messages.

E.g. an alert message before this change:

> Failed to create events table: s3a://<REDACTED/events/_delta_log: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by V1ToV2AwsCredentialProviderAdapter : software.amazon.awssdk.services.sts.model.StsException: User: arn:aws:iam::<REDACTED>:user/<REDACTED> is not authorized to perform: sts:AssumeRole on resource: arn:aws:iam::<REDACTED>:role/<REDACTED> (Service: Sts, Status Code: 403, Request ID: 00000000-0000-0000-0000-000000000000)

The corresponding alert after this change:

> Failed to create events table: s3a://<REDACTED/events/_delta_log: Failed to initialize AWS access credentials: Missing permissions to assume the AWS IAM role

**Other small changes I snuck into this commit:**

- Added specific webhook alerts for Hudi.
- Removed the AssumedRoleCredentialsProvider for aws sdk v1.  This is no
  longer needed now that Hadoop is fully using aws sdk v2.
- Fixed minor bug with retrying creating a database in Hudi Writer
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants