Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Use Custom River Error Handler #6466

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

grahamplata
Copy link
Contributor

@grahamplata grahamplata commented Jan 21, 2025

Allow the custom handler to manage attempts instead of worker job.

Resolves: https://github.com/rilldata/rill-private-issues/issues/1101

@grahamplata grahamplata self-assigned this Jan 21, 2025
@grahamplata grahamplata marked this pull request as ready for review January 21, 2025 20:59
@begelundmuller
Copy link
Contributor

@pjain1 Deferring to you to review this (for context, error logs lead to alerts, warning logs are usually overlooked unless we are specifically searching for them).

@begelundmuller begelundmuller requested review from pjain1 and removed request for pjain1 and begelundmuller January 27, 2025 14:24
Copy link
Contributor

@begelundmuller begelundmuller left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just an idea – have you considered configuring the log level to error only if job.Attempt == job.MaxAttempts? (You can use w.logger.Log(level, ...) to pass a level dynamically.)

Alternatively I wonder if we should just remove these logs and format the returned errors nicely instead (since River will log the error from the last job attempt at error level, right?)

@grahamplata
Copy link
Contributor Author

Just an idea – have you considered configuring the log level to error only if job.Attempt == job.MaxAttempts? (You can use w.logger.Log(level, ...) to pass a level dynamically.)

Alternatively I wonder if we should just remove these logs and format the returned errors nicely instead (since River will log the error from the last job attempt at error level, right?)

I like the idea of dynamically changing the logs but I think it would be most useful just allowing river to manage it

return err
}
return nil
// rare case but if its retried, we should repair the billing as it might be in some inconsistent state
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would still keep job.Attempt > 1 condition otherwise it will do a repair for the first run.

@@ -125,7 +115,7 @@ func (w *StartTrialWorker) Work(ctx context.Context, job *river.Job[StartTrialAr
TrialEndDate: sub.TrialEndDate,
})
if err != nil {
w.logger.Error("failed to send trial started email", zap.String("org_name", trialOrg.Name), zap.String("org_id", trialOrg.ID), zap.String("billing_email", trialOrg.BillingEmail), zap.Error(err))
return fmt.Errorf("failed to send trial started email for organization %s: %w", trialOrg.Name, err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't return an error here as job will try to start the trial again, although currently w.admin.StartTrial is idempotent but I would not bet on it on future. I think we should just get an alert when this happens.

Copy link
Member

@pjain1 pjain1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we wanted to prevent error logging in case of retires so this PR does removes explicit logging from our end however river inherently logs an error in case of retires like

{"level":"error","ts":1738056846.8063529,"caller":"river@v0.11.4/job_executor.go:305","msg":"jobExecutor: Job errored","error":"organization already has billing initialized","job_id":5457,"job_kind":"init_org_billing"}

see the caller above, I have no idea how we can do that, probably we just use some logic on datadog to avoid that.

@grahamplata grahamplata requested a review from pjain1 January 28, 2025 19:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants