You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The problem is that Freshping only keeps track of at most one deduplication key per PagerDuty service, but each PagerDuty service can map to more than one Freshping check, so it can't resolve more than one incident for a PagerDuty service.
Linearized execution
Freshping check C1 goes down
FreshPager triggers an incident on PagerDuty service S1 and stores the deduplication key D1 under the integration key for S1
Freshping check C2 goes down
FreshPager triggers an incident on PagerDuty service S1 and stores the deduplication key D2 under the integration key for S1, overwriting and erasing D1 (D1 incident can now never be automatically resolved)
Freshping check C1 comes up
FreshPager looks up the deduplication key D2 based on the integration key for S1, which used to map to D1 but it got overwritten by D2 in step 4. PagerDuty incident for D2 is resolved, so it looks like C2 is up.
Freshping check C2 comes up
FreshPager fails to look up any deduplication key for the S1 integration key, because D1 was removed in step 6 while resolving the D2 alert.
FreshPager never resolves the D1 alert, so D1 stays open until a human manually resolves it.
It's sort of a race when both services go down at the same time, otherwise they would have shared one deduplication key to start with. To resolve that, we could process requests serially, but each one takes about 3 seconds. Maybe the deduplication keys should be stored with different keys instead.
Logs
The text was updated successfully, but these errors were encountered: