Skip to content
This repository has been archived by the owner on Jun 26, 2023. It is now read-only.

Fix sidekiq_threshold ignored on first failure #1

Closed
wants to merge 1 commit into from

Conversation

ejoubaud
Copy link

@ejoubaud ejoubaud commented Jul 3, 2018

DO NOT MERGE: This is just a placeholder PR to explain what's happening here.

We're using this branch in our Gemfile while rollbar#761 makes its way to master. I originally created the PR with a fork on my own Github account (ejoubaud/), so I'm pushing it here too (envato/ fork) so we can have a dependency on an envato repo.

As soon as rollbar#761 gets merged (or rejected), we can just delete this fork and unpin the gem.

Fixes a bug where the first failure of a Sidekiq would always get reported regardless of the `sidekiq_threshold` setting, because then `job_hash['retry_count']` is not set.

Looking at rollbar#319, the current behaviour was definitely not the expected one and I believe it wasn't the original one, but the bug was introduced by a breaking change in Sidekiq that wasn't accounted for in this gem.

Back when this feature was implemented, in rollbar#319, Sidekiq retry mechanism was implemented as a Sidekiq middleware, so it would run downstream of the Rollbar middleware. When an error would bubble up, it would first encounter the Sidekiq `retry_job` middleware, which would set/increment the job's `retry_count` ([code](https://github.com/mperham/sidekiq/blob/d7d000465cd086160843fe95b8836b22d67578b6/lib/sidekiq/middleware/server/retry_jobs.rb#L107-L113) triggered [here](https://github.com/mperham/sidekiq/blob/d7d000465cd086160843fe95b8836b22d67578b6/lib/sidekiq/middleware/server/retry_jobs.rb#L83)), reraise, and only then would the exception meet the Rollbar middleware, with the current job's `retry_count` well set.

Then 2 years ago, sidekiq/sidekiq#3235 introduced Sidekiq 5.0.0, and with it a breaking change: the `retry_job` would not be a middleware anymore but part of the Sidekiq processor. So it would run upstream of the Sidekiq middleware chain, including the Rollbar middleware, meaning that when an error would bubble up, it would meet the Rollbar middleware before Sidekiq's `retry_job` had set the correct `retry_count`.

You'll note that I'm changing a spec about how the middleware should behave when `retry_count` is not set. That spec was introdued in rollbar#346 as a way to fix a `nil` error. It doesn't look like the intent was to actually improve the behaviour of the `sidekiq_threshold` and it seems the contributor may not have been using the feature, just working around a bug that it was causing. Anyway the behaviour specified doesn't make sense to me, and contradicts the expected behaviour described in the original implementation's PR (rollbar#346). If `retry` is truthy and `retry_count` is falsy, the behaviour should not be to ignore the threshold, but to enforce by considering that `retry_count` is 0. Otherwise we get that bug where the first failure gets reported to Rollbar regardless of the configured threshold, before then the next n-1 next failures get rightfully ignored.
@ejoubaud ejoubaud closed this Feb 1, 2019
@ejoubaud ejoubaud deleted the fix-rollbar-sidekiq-threshold branch February 1, 2019 14:17
@ejoubaud
Copy link
Author

ejoubaud commented Feb 1, 2019

Upstream PR has been merged and the gem unpinned a while ago, closing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant