Less scary UI and email rules for gating status changes #4219

AdamWill · 2021-05-18T01:35:30Z

So, critical path update gating on openQA test results recently went into production. @mcatanzaro reported several communication/UX issues around this:

"So this seems like a good idea, but I notice that all tests are marked as failed until the results arrive. This leads to incorrect failure emails and incorrect UI indicating lots of test failures where none exist."
...
"See e.g.:

https://bodhi.fedoraproject.org/updates/FEDORA-2021-72b0305521

The gating status changed to failed, then to waiting, then back to failed, then to passed. And yes, each status change triggers an email."

It would be nice if we had something like a test execution database so Bodhi could know that the test that will produce a "missing" result is queued or running, but we don't. So, can we do anything a bit less elegant?

It does seem like there are some odd details here. Why did that update change to 'failed' (from what?) then 'waiting' then 'failed' again? How does Bodhi determine "waiting"?

Once that's figured out, maybe we can do something to avoid useless emails for very early gating status events to do with 'missing' results, and perhaps we could also have a more obvious difference between "waiting for test results" and "at least one gating test failed" in the web UI. Especially in, say, the first couple of hours after an event that should trigger tests has happened?

mattiaverga · 2021-05-18T06:08:05Z

Is it another variant of #2124 ?

AdamWill · 2021-05-18T06:31:56Z

well, it's definitely in the same area. maybe the result of that issue is what's causing the state pingpong mcatanzaro noted?

mcatanzaro · 2021-05-18T13:26:20Z

My complaint boils down to (a) #2124, and (b) state pingpong.

omajid · 2021-05-18T15:52:00Z

Can I add another vote for "state pingpong"? Is there some chance bodhi could go from "waiting for test results" to either "all tests have passed" or "at least one test has failed" without doing "failed"->"waiting"->"failed"->"passed"? https://bodhi.fedoraproject.org/updates/FEDORA-2021-c06b64b5ee is an example of a recent update that went through the state transitions.

mattiaverga · 2021-05-18T17:25:00Z

So, I had a look about how Bodhi handles gating status changes...
We currently update the gating status in three cases: through a fedmsg listener (greenwave consumer), a cron task (check_policies task) and "manually" when upon editing the update.

First thing I've noticed, the greenwave consumer only processes those messages with a subject_identifier that corresponds to a build nvr ("subject_type": "koji_build")... but those for critpath updates (OpenQA) carries the update alias in that field ("subject_type": "bodhi_update"). I think both cases should be handled.

Second thought: do we want to still run the cron task? Maybe as fallback, if we consumer misses some message for whatever reason.

Third: for multi builds updates (and when editing the update) the gating status change is handled by Update.update_test_gating_status(). I tried to follow the looong chain of methods which are called by the former, but I think I'll need a lot of coffee to understand everything... could we simplify it in some way?

AdamWill · 2021-05-18T17:32:33Z

thanks for looking! I'll dig into those things too.
Bodhi already has code somewhere for handling the different message types, I think, we can probably find that and abstract it out.

AdamWill · 2021-05-18T19:24:55Z

hmm. we also have an issue that greenwave publishes seven messages per state update, because of how we had to set up the greenwave polices, because of limitations in how you can express things there and stuff. ugh. might be time to write that 'let us use REs in policies' PR for greenwave. this doesn't exactly cause any insuperable problems, but it would mean bodhi is doing 7x more work than it should really need to when consuming these messages.

AdamWill · 2021-05-18T20:21:09Z

So I'm looking at various things to do here, but man, every time I look at this I'm reminded how awkward it is. The design here is just...not great.

Greenwave is sort of designed to have policies like "for product versions X, Y and Z, in decision contexts A, B and C, tests F, G and H must pass". And mostly the idea is that things asking Greenwave for decisions pass in the product version and decision context, Greenwave doesn't "know" them. So Bodhi does have code for knowing what product version and decision context to pass in when explicitly asking Greenwave for a decision about a particular update.

But then we had the genius idea of having Greenwave publish messages for decision changes (it was probably my idea, it's that bad), but that really kinda messes with the design. Because then you have to sort of teach Greenwave about when a decision might change, and what product version and decision context people might want to know about.

So if you look at the Greenwave decision message publishing code, it has all this kind of icky magic knowledge that, oh, it should publish decision updates when it sees messages from resultsdb, and it should look for a Koji or Brew task ID or a compose ID, and it even has this whole magic ability to figure out what product version it should restrict the decision to for Koji and Brew builds (by parsing known attributes of those messages to guess at a distro version) - that whole file isn't used by anything but this resultsdb consumer.

That's all kinda clever and it kinda works, but...it really doesn't seem to fit the design, does it? Greenwave is supposed to be a 'decision engine' which is told what it's being asked to make a decision about and just spits out the decision. It feels very awkward for it to have this bit bolted on which is really a greenwave client with all this magic knowledge about some very specific decision scenarios of interest to Fedora and/or RHEL. Especially when probably all of that knowledge is duplicated with one or more actual external Greenwave users, like Bodhi.

One problem here is that the magic knowledge in Greenwave doesn't really extend to openQA tests and messages; in theory what we "should" do to match what's done for package build-level tests is teach the Greenwave decision publishing stuff how to know what Fedora version an update is for, and make it only publish messages for decision contexts relevant to that version. But I don't really want to do that at all because the whole design feels kinda like a hole we should stop digging any deeper...

I sort of feel like instead of this resultsdb message consumer in Greenwave that publishes greenwave "decision change" messages which Bodhi then listens out for, we should maybe have a resultsdb consumer in Bodhi which says "oh hey this looks like the sort of result which might change the decision for an update we care about, let's ask Greenwave for a new decision". That way the "what contexts we care about" code can all be shared in Bodhi between the consumer and codepaths which want to explicitly go and ping Greenwave for a decision in response to something other than a 'new result' message.

What does anyone else think about this? @hluk

AdamWill · 2021-05-18T21:30:19Z

OK, so I think I see the cause of the status pingpong here, in bodhi.server.modules.Update.new:

    # The request to testing for side-tag updates is set within the signed consumer
    if not data.get("from_tag"):
        log.debug(f"Setting request for new update {up.alias}.")
        up.set_request(db, req, request.user.name)

    if config.get('test_gating.required'):
        log.debug(
            'Test gating required is enforced, marking the update as waiting on test gating')
        up.test_gating_status = TestGatingStatus.waiting

Now, note that set_request has this near the end:

    if action == UpdateRequest.testing:
        if config['test_gating.required']:
            log.info(f"Updating test gating status of {self.alias}")
            self.update_test_gating_status()
            db.commit()

So when creating a new update, we call set_request, which does self.update_test_gating_status() - which will 'ping' the status to 'failed', because the tests certainly won't have finished yet - then almost immediately after that, we do up.test_gating_status = TestGatingStatus.waiting and 'pong' the status to 'waiting'. Then a few minutes later the cronjob comes by, calls update_test_gating_status(), and the status 'pings' back to 'failed' again.

This is exactly what we do see happening on critpath updates: right at the time the update is created, the status goes to 'failed' then immediately to 'waiting', and then a few minutes later, it goes back to 'failed'. Then it goes to 'passed' the next time the cron job runs after the tests all pass.

Right now the fedmsg consumer is not doing anything because, as @mattiaverga noted, it does not know how to handle messages for tests of updates (as opposed to builds). So only the cron job and things that happen when the update is created or edited or goes through some other sort of state change are in play.

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. Signed-off-by: Adam Williamson <awilliam@redhat.com>

On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>

AdamWill · 2021-05-18T22:12:56Z

#4221 is my first cut at some fixes here. I didn't write any tests yet, I'll see if the PR fails any existing ones first then get to work on that.

On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. Signed-off-by: Adam Williamson <awilliam@redhat.com>

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. As part of this we change the mock greenwave return value used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. Signed-off-by: Adam Williamson <awilliam@redhat.com>

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also substantially simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>

On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>

…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>

On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>

This is part of addressing #4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>

On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>

This is part of addressing #4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>

AdamWill mentioned this issue May 18, 2021

Better gating status #4221

Merged

AdamWill mentioned this issue May 21, 2021

Replace greenwave message consumer with resultsdb and waiverdb message consumers #4224

Closed

mattiaverga linked a pull request Jun 3, 2021 that will close this issue

Better gating status #4221

Merged

mergify bot closed this as completed in #4221 Sep 8, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Less scary UI and email rules for gating status changes #4219

Less scary UI and email rules for gating status changes #4219

AdamWill commented May 18, 2021

mattiaverga commented May 18, 2021

AdamWill commented May 18, 2021

mcatanzaro commented May 18, 2021

omajid commented May 18, 2021

mattiaverga commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

Less scary UI and email rules for gating status changes #4219

Less scary UI and email rules for gating status changes #4219

Comments

AdamWill commented May 18, 2021

mattiaverga commented May 18, 2021

AdamWill commented May 18, 2021

mcatanzaro commented May 18, 2021

omajid commented May 18, 2021

mattiaverga commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021

AdamWill commented May 18, 2021