-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Less scary UI and email rules for gating status changes #4219
Comments
Is it another variant of #2124 ? |
well, it's definitely in the same area. maybe the result of that issue is what's causing the state pingpong mcatanzaro noted? |
My complaint boils down to (a) #2124, and (b) state pingpong. |
Can I add another vote for "state pingpong"? Is there some chance bodhi could go from "waiting for test results" to either "all tests have passed" or "at least one test has failed" without doing "failed"->"waiting"->"failed"->"passed"? https://bodhi.fedoraproject.org/updates/FEDORA-2021-c06b64b5ee is an example of a recent update that went through the state transitions. |
So, I had a look about how Bodhi handles gating status changes... First thing I've noticed, the greenwave consumer only processes those messages with a Second thought: do we want to still run the cron task? Maybe as fallback, if we consumer misses some message for whatever reason. Third: for multi builds updates (and when editing the update) the gating status change is handled by |
thanks for looking! I'll dig into those things too. |
hmm. we also have an issue that greenwave publishes seven messages per state update, because of how we had to set up the greenwave polices, because of limitations in how you can express things there and stuff. ugh. might be time to write that 'let us use REs in policies' PR for greenwave. this doesn't exactly cause any insuperable problems, but it would mean bodhi is doing 7x more work than it should really need to when consuming these messages. |
So I'm looking at various things to do here, but man, every time I look at this I'm reminded how awkward it is. The design here is just...not great. Greenwave is sort of designed to have policies like "for product versions X, Y and Z, in decision contexts A, B and C, tests F, G and H must pass". And mostly the idea is that things asking Greenwave for decisions pass in the product version and decision context, Greenwave doesn't "know" them. So Bodhi does have code for knowing what product version and decision context to pass in when explicitly asking Greenwave for a decision about a particular update. But then we had the genius idea of having Greenwave publish messages for decision changes (it was probably my idea, it's that bad), but that really kinda messes with the design. Because then you have to sort of teach Greenwave about when a decision might change, and what product version and decision context people might want to know about. So if you look at the Greenwave decision message publishing code, it has all this kind of icky magic knowledge that, oh, it should publish decision updates when it sees messages from resultsdb, and it should look for a Koji or Brew task ID or a compose ID, and it even has this whole magic ability to figure out what product version it should restrict the decision to for Koji and Brew builds (by parsing known attributes of those messages to guess at a distro version) - that whole file isn't used by anything but this resultsdb consumer. That's all kinda clever and it kinda works, but...it really doesn't seem to fit the design, does it? Greenwave is supposed to be a 'decision engine' which is told what it's being asked to make a decision about and just spits out the decision. It feels very awkward for it to have this bit bolted on which is really a greenwave client with all this magic knowledge about some very specific decision scenarios of interest to Fedora and/or RHEL. Especially when probably all of that knowledge is duplicated with one or more actual external Greenwave users, like Bodhi. One problem here is that the magic knowledge in Greenwave doesn't really extend to openQA tests and messages; in theory what we "should" do to match what's done for package build-level tests is teach the Greenwave decision publishing stuff how to know what Fedora version an update is for, and make it only publish messages for decision contexts relevant to that version. But I don't really want to do that at all because the whole design feels kinda like a hole we should stop digging any deeper... I sort of feel like instead of this resultsdb message consumer in Greenwave that publishes greenwave "decision change" messages which Bodhi then listens out for, we should maybe have a resultsdb consumer in Bodhi which says "oh hey this looks like the sort of result which might change the decision for an update we care about, let's ask Greenwave for a new decision". That way the "what contexts we care about" code can all be shared in Bodhi between the consumer and codepaths which want to explicitly go and ping Greenwave for a decision in response to something other than a 'new result' message. What does anyone else think about this? @hluk |
OK, so I think I see the cause of the status pingpong here, in
Now, note that
So when creating a new update, we call This is exactly what we do see happening on critpath updates: right at the time the update is created, the status goes to 'failed' then immediately to 'waiting', and then a few minutes later, it goes back to 'failed'. Then it goes to 'passed' the next time the cron job runs after the tests all pass. Right now the fedmsg consumer is not doing anything because, as @mattiaverga noted, it does not know how to handle messages for tests of updates (as opposed to builds). So only the cron job and things that happen when the update is created or edited or goes through some other sort of state change are in play. |
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. Signed-off-by: Adam Williamson <awilliam@redhat.com>
On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>
#4221 is my first cut at some fixes here. I didn't write any tests yet, I'll see if the PR fails any existing ones first then get to work on that. |
On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. As part of this we change the mock greenwave return value used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. As part of this we change the mock greenwave return value used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. As part of this we change the mock greenwave return value used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. As part of this we change the mock greenwave return value used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also substantially simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
…ra#4219) This is part of addressing fedora-infra#4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is part of addressing #4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
On the normal path of creating an update - where it is created and immediately submitted to updates-testing - we're setting the gating status twice. We call `set_request`, which calls `update_test_gating_status()` if the request is updates-testing and gating is required, and then almost immediately afterwards, we straight up set the status to 'waiting'. That doesn't make much sense. This changes things so that the `new()` method doesn't hard set the status to 'waiting' if it's calling `set_request` and the request is to updates-testing, as is usually the case. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This is part of addressing #4219. Currently we always count the gating status as failed if requirements are unsatisfied. However, it's usual for test results to not be present yet for a little while after the update is created or edited. Ideally we would have a mechanism for knowing for sure whether the tests in question are pending, but we don't, so let's use a dumb rule instead: if there are no failures (only missing results), and it's less than two hours since the update was last modified, we'll return 'waiting' as the status. After two hours, maybe the tests actually didn't get scheduled or the policy is wrong, so we'll return 'failed'. This also simplifies the logic used to decide the status, and reduces code duplication. We can determine the status using only the number of satisfied and unsatisfied requirements: if there are none of either it is ignored, if there are satisfied requirements and no unsatisfied requirements it is passed, otherwise it is failed or waiting. As part of this we change the mock greenwave responses used in a lot of other tests to use test-result-failed not -missing, because they currently use -missing but expect the gating status to be 'failed', which it no longer always will be. We also have to make the mock greenwave responses used in tests that hit this code a bit more realistic - they have to always include the satisfied and unsatisfied requirement lists. Note I've checked greenwave code and it does always include these in a successful query response. Signed-off-by: Adam Williamson <awilliam@redhat.com>
So, critical path update gating on openQA test results recently went into production. @mcatanzaro reported several communication/UX issues around this:
"So this seems like a good idea, but I notice that all tests are marked as failed until the results arrive. This leads to incorrect failure emails and incorrect UI indicating lots of test failures where none exist."
...
"See e.g.:
https://bodhi.fedoraproject.org/updates/FEDORA-2021-72b0305521
The gating status changed to failed, then to waiting, then back to failed, then to passed. And yes, each status change triggers an email."
It would be nice if we had something like a test execution database so Bodhi could know that the test that will produce a "missing" result is queued or running, but we don't. So, can we do anything a bit less elegant?
It does seem like there are some odd details here. Why did that update change to 'failed' (from what?) then 'waiting' then 'failed' again? How does Bodhi determine "waiting"?
Once that's figured out, maybe we can do something to avoid useless emails for very early gating status events to do with 'missing' results, and perhaps we could also have a more obvious difference between "waiting for test results" and "at least one gating test failed" in the web UI. Especially in, say, the first couple of hours after an event that should trigger tests has happened?
The text was updated successfully, but these errors were encountered: