fix(stdlib): sort statuses also by _time #3996

alespour · 2021-08-25T16:54:45Z

This PR fixes issue in alerting where, depending on check and rule scheduling and timing configuration, state changes are sometimes not detected and therefore notifications not sent.

The problem is that status of data points may be evaluated multiple times with certain scheduling, for example as reported in #3807, deadman check evaluates status based on data point time and not a metric value. That results in existence of multiple statuses with the same source timestamp but different time (and level) in the monitoring bucket. Then, when an alert rule attempts to detect state change, it gets incorrect sequence of statuses, because sorting by only (the same) source timestamp is insufficient.

Affected functions are stateChangesOnly and _stateChanges functions in monitor package, where before sort,
_level is dropped from the group key which results in a new series with possibly unpredictable order (by _time).

The issue is a side effect of the changes in #2725. Mea culpa.

Done checklist

Test cases written

wolffcm

The fix looks good and the changes make sense. Thanks for fixing this!

Can I trouble you to adapt your test to our newer format which uses a Flux keyword testcase? We want to be using it moving forward. Here is an example:

flux/stdlib/universe/window_test.flux

Line 7 in 955bc03

testcase window_period_gaps {

alespour · 2021-08-31T15:05:38Z

@wolffcm Done. Nice feature!

wolffcm

Look great. Thanks again!

wolffcm · 2021-09-01T19:33:56Z

@alespour Unfortunately we had to revert this commit because the new test doesn't pass in the OSS version of influxdb---these tests (like the older ones) will use testing.load() to insert data into InfluxDB and then select it back.

The culprit seems to be that the data being passed to testing.load() is pivoted. There are a couple ways to solve this:

Remove testing.load() and just run the test against in-memory data. This is fine because you're not testing push-down logic or planner rules or anything like that.
Stream unpivoted data into testing.load().

The first option is the easiest and that's what I recommend. It should be as simple as just deleting the |> testing.load() line from your test.

Unfortunately, I'm not sure of the right steps to check if a new Flux test like this will pass in influxdb before actually merging to master. We need to look into that.

jsternberg · 2021-09-01T20:27:05Z

Fixed using @wolffcm's first suggestion in #4012.

alespour marked this pull request as ready for review August 25, 2021 17:20

nathanielc requested a review from wolffcm August 30, 2021 18:21

wolffcm suggested changes Aug 30, 2021

View reviewed changes

alespour added 3 commits August 31, 2021 13:53

fix(stdlib): sort statuses also by _time

ecdde46

test: multiple statuses with the same source timestamp

85afd43

test: use new test format

462093c

alespour force-pushed the fix/monitor-state-change branch from 104b1af to 462093c Compare August 31, 2021 14:37

alespour added 2 commits August 31, 2021 16:38

test: moved comment to proper place

6b9ed9b

chore: update generated file

3b859c1

wolffcm approved these changes Aug 31, 2021

View reviewed changes

wolffcm merged commit 1fbf56e into influxdata:master Aug 31, 2021

scbrickley mentioned this pull request Sep 1, 2021

chore: Revert commit with failing test #4011

Merged

jsternberg added a commit that referenced this pull request Sep 1, 2021

chore: reapply PR #3996

b4a6c31

jsternberg added a commit that referenced this pull request Sep 1, 2021

chore: reapply PR #3996 (#4012)

02f7de7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(stdlib): sort statuses also by _time #3996

fix(stdlib): sort statuses also by _time #3996

alespour commented Aug 25, 2021

wolffcm left a comment

alespour commented Aug 31, 2021

wolffcm left a comment

wolffcm commented Sep 1, 2021

jsternberg commented Sep 1, 2021

fix(stdlib): sort statuses also by _time #3996

fix(stdlib): sort statuses also by _time #3996

Conversation

alespour commented Aug 25, 2021

Done checklist

wolffcm left a comment

Choose a reason for hiding this comment

alespour commented Aug 31, 2021

wolffcm left a comment

Choose a reason for hiding this comment

wolffcm commented Sep 1, 2021

jsternberg commented Sep 1, 2021