-
Notifications
You must be signed in to change notification settings - Fork 125
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug: Use Custom River Error Handler #6466
base: main
Are you sure you want to change the base?
Conversation
@pjain1 Deferring to you to review this (for context, error logs lead to alerts, warning logs are usually overlooked unless we are specifically searching for them). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just an idea – have you considered configuring the log level to error only if job.Attempt == job.MaxAttempts
? (You can use w.logger.Log(level, ...)
to pass a level dynamically.)
Alternatively I wonder if we should just remove these logs and format the returned errors nicely instead (since River will log the error from the last job attempt at error level, right?)
I like the idea of dynamically changing the logs but I think it would be most useful just allowing river to manage it |
admin/jobs/river/org_jobs.go
Outdated
return err | ||
} | ||
return nil | ||
// rare case but if its retried, we should repair the billing as it might be in some inconsistent state |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would still keep job.Attempt > 1
condition otherwise it will do a repair for the first run.
admin/jobs/river/org_jobs.go
Outdated
@@ -125,7 +115,7 @@ func (w *StartTrialWorker) Work(ctx context.Context, job *river.Job[StartTrialAr | |||
TrialEndDate: sub.TrialEndDate, | |||
}) | |||
if err != nil { | |||
w.logger.Error("failed to send trial started email", zap.String("org_name", trialOrg.Name), zap.String("org_id", trialOrg.ID), zap.String("billing_email", trialOrg.BillingEmail), zap.Error(err)) | |||
return fmt.Errorf("failed to send trial started email for organization %s: %w", trialOrg.Name, err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't return an error here as job will try to start the trial again, although currently w.admin.StartTrial
is idempotent but I would not bet on it on future. I think we should just get an alert when this happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we wanted to prevent error logging in case of retires so this PR does removes explicit logging from our end however river inherently logs an error in case of retires like
{"level":"error","ts":1738056846.8063529,"caller":"river@v0.11.4/job_executor.go:305","msg":"jobExecutor: Job errored","error":"organization already has billing initialized","job_id":5457,"job_kind":"init_org_billing"}
see the caller above, I have no idea how we can do that, probably we just use some logic on datadog to avoid that.
Allow the custom handler to manage attempts instead of worker job.
Resolves: https://github.com/rilldata/rill-private-issues/issues/1101