-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Overhaul various 'update status' checks - meets_testing_requirements, check_requirements, critpath_approved, check_karma_thresholds - and approve_testing script #5630
Conversation
e26f475
to
ba9a7fa
Compare
Oh, one of the commit messages promises my evaluation of the test_approve_testing tests as a comment on the PR, soo: assessment of approve_testing codethere are only four main paths.
assessment of test_approve_testing teststest_autokarma_update_meeting_time_requirements_gets_one_comment - ensures we comment exactly once on multiple runs when update meets manual push time threshold but not autopush threshold test_autokarma_update_without_mandatory_days_in_testing - tests we dip early if no mandatory_days_in_testing or autotime test_autokarma_update_not_meeting_testing_requirements - ensures we don't comment on or approve update that doesn't meet requirements. leaves karma at default (I think +1), sets days in testing to 6 test_exception_handler - tests exception handler test_subsequent_comments_after_initial_push_comment - tests the case where the update is approved, edited, then approved again, expects to see two approvals. is effectively redundant with test_autotime_update_meeting_test_requirements_gets_pushed - tests non-autokarma, non-critpath, autotime update with stable_days set to 7 and time in testing of 7 days is autopushed (and marked approved). does not check whether an approval comment is posted test_no_autotime_update_meeting_stable_days_and_test_requirement - tests update without autotime set does not get autopushed, even with a stable_days value set and reached (uses 10 days) test_autotime_update_does_not_meet_test_requirements - tests autotime update with stable_days and mandatory_days_in_testing set to 2 is not pushed if days in testing is 1. mostly functionally equivalent to test_autotime_update_does_not_meet_stable_days_doesnt_get_pushed , except with a non-default mandatory days in testing value and an update that does not meet it. probably useless test_autotime_update_does_no_mandatory_days_in_testing - tests that autotime update is pushed stable with zero days in testing if stable_days is 0. relies on the default value of stable_days being 0, but doesn't say so. sets mandatory_days_in_testing to 0 to ensure meets_testing_requirements passes test_update_conflicting_build_not_pushed - tests we don't push updates with conflicting builds. also tests we don't post a comment saying they're approved, or set date_approved. test_autotime_update_gets_pushed_dont_send_duplicated_notification - tests we don't publish UpdateRequirementsMetStable again when auto-pushing, if the update previously met the manual push threshold and had it published then. doesn't test whether we repeat the comment ACTUALLY USEFUL TESTS:
STUFF TO ENSURE IS TESTED ELSEWHERE:
|
here's some more random assessment notes lying around my note-to-self that I kept while working on this: assessment of test_models tests
assessment of test_updates tests
test_update_reaching_time_in_testing_negative_karma - dupe of test_update_reaching_stable_karma_not_critpath_min_karma |
still needs...
|
seems like the CI is timing out at 30 minutes. don't know why it's taking so long. possibly i've got something in there which isn't mocking out a message publish, or something...will try and look at it tomorrow... |
a84909b
to
e9ab29c
Compare
Hallelujah, praise the lord! That took a while. So, yeah, there were several unmocked message publish events happening, but in bcd (where I tested this), this doesn't break things the way it does in CI. Filed #5633 about that. I'll see if I can think of any way to tweak it tomorrow. The integration tests are not my fault, I don't think. They also failed on #5632 , which changed nothing but the README (I filed that PR just as a dumb way to check if this PR was really causing the problems). I don't know what's changed to make them suddenly fail. |
In working on fedora-infra#5630 I found that what happens when you don't properly mock out message publish attempts in tests isn't great. In CI, the test will hang for several minutes, then eventually proceed. Obviously we don't want to slow the tests down, and there is a 30 minute time limit on the test, so if you do this more than twice or so, the tests will time out. It's not very clear why the tests are timing out and failing - you have to guess that it's message publishing causing the problem. In bcd, the test doesn't hang, it just silently proceeds (I guess the message gets published to the rabbitmq instance in the bcd environment, I haven't checked). So you will not notice that you got it wrong until you submit a PR. Neither of those behaviors is great, so instead, let's mock out the 'real' message publishing function in fedora_messaging in a fixture that's automatically used by every test. We replace it with a mock that raises an exception with a hopefully- useful message. We have to mock _twisted_publish, not publish, because mock_sends also mocks _twisted_publish; if we mocked publish, we would break mock_sends. This way mock_sends gets to override us and work OK. Unfortunately this usually causes a database rollback error so you have to scroll back through some confusing errors to find the 'real' problem, but it's still an improvement, I think. Signed-off-by: Adam Williamson <awilliam@redhat.com>
In working on fedora-infra#5630 I found that what happens when you don't properly mock out message publish attempts in tests isn't great. In CI, the test will hang for several minutes, then eventually proceed. Obviously we don't want to slow the tests down, and there is a 30 minute time limit on the test, so if you do this more than twice or so, the tests will time out. It's not very clear why the tests are timing out and failing - you have to guess that it's message publishing causing the problem. In bcd, the test doesn't hang, it just silently proceeds (I guess the message gets published to the rabbitmq instance in the bcd environment, I haven't checked). So you will not notice that you got it wrong until you submit a PR. Neither of those behaviors is great, so instead, let's mock out the 'real' message publishing function in fedora_messaging in a fixture that's automatically used by every test. We replace it with a mock that raises an exception with a hopefully- useful message. We have to mock _twisted_publish, not publish, because mock_sends also mocks _twisted_publish; if we mocked publish, we would break mock_sends. This way mock_sends gets to override us and work OK. Unfortunately this usually causes a database rollback error so you have to scroll back through some confusing errors to find the 'real' problem, but it's still an improvement, I think. Fixes: fedora-infra#5633 Signed-off-by: Adam Williamson <awilliam@redhat.com>
In working on fedora-infra#5630 I found that what happens when you don't properly mock out message publish attempts in tests isn't great. In CI, the test will hang for several minutes, then eventually proceed. Obviously we don't want to slow the tests down, and there is a 30 minute time limit on the test, so if you do this more than twice or so, the tests will time out. It's not very clear why the tests are timing out and failing - you have to guess that it's message publishing causing the problem. In bcd, the test doesn't hang, it just silently proceeds (I guess the message gets published to the rabbitmq instance in the bcd environment, I haven't checked). So you will not notice that you got it wrong until you submit a PR. Neither of those behaviors is great, so instead, let's mock out the 'real' message publishing function in fedora_messaging in a fixture that's automatically used by every test. We replace it with a mock that raises an exception with a hopefully- useful message. We have to mock _twisted_publish, not publish, because mock_sends also mocks _twisted_publish; if we mocked publish, we would break mock_sends. This way mock_sends gets to override us and work OK. Unfortunately this usually causes a database rollback error so you have to scroll back through some confusing errors to find the 'real' problem, but it's still an improvement, I think. Fixes: fedora-infra#5633 Signed-off-by: Adam Williamson <awilliam@redhat.com>
e9ab29c
to
36203b5
Compare
In working on fedora-infra#5630 I found that what happens when you don't properly mock out message publish attempts in tests isn't great. In CI, the test will hang for several minutes, then eventually proceed. Obviously we don't want to slow the tests down, and there is a 30 minute time limit on the test, so if you do this more than twice or so, the tests will time out. It's not very clear why the tests are timing out and failing - you have to guess that it's message publishing causing the problem. In bcd, the test doesn't hang, it just silently proceeds (I guess the message gets published to the rabbitmq instance in the bcd environment, I haven't checked). So you will not notice that you got it wrong until you submit a PR. Neither of those behaviors is great, so instead, let's mock out the 'real' message publishing function in fedora_messaging in a fixture that's automatically used by every test. We replace it with a mock that raises an exception with a hopefully- useful message. We have to mock _twisted_publish, not publish, because mock_sends also mocks _twisted_publish; if we mocked publish, we would break mock_sends. This way mock_sends gets to override us and work OK. Unfortunately this usually causes a database rollback error so you have to scroll back through some confusing errors to find the 'real' problem, but it's still an improvement, I think. Fixes: fedora-infra#5633 Signed-off-by: Adam Williamson <awilliam@redhat.com>
also todo, obviously: fix test coverage. but any comments while I work on the little todo list are still welcome :D |
36203b5
to
0b53d1d
Compare
integration test failures don't look to be my fault, github is having trouble reaching pagure.io for some reason. |
In working on #5630 I found that what happens when you don't properly mock out message publish attempts in tests isn't great. In CI, the test will hang for several minutes, then eventually proceed. Obviously we don't want to slow the tests down, and there is a 30 minute time limit on the test, so if you do this more than twice or so, the tests will time out. It's not very clear why the tests are timing out and failing - you have to guess that it's message publishing causing the problem. In bcd, the test doesn't hang, it just silently proceeds (I guess the message gets published to the rabbitmq instance in the bcd environment, I haven't checked). So you will not notice that you got it wrong until you submit a PR. Neither of those behaviors is great, so instead, let's mock out the 'real' message publishing function in fedora_messaging in a fixture that's automatically used by every test. We replace it with a mock that raises an exception with a hopefully- useful message. We have to mock _twisted_publish, not publish, because mock_sends also mocks _twisted_publish; if we mocked publish, we would break mock_sends. This way mock_sends gets to override us and work OK. Unfortunately this usually causes a database rollback error so you have to scroll back through some confusing errors to find the 'real' problem, but it's still an improvement, I think. Fixes: #5633 Signed-off-by: Adam Williamson <awilliam@redhat.com>
noooo merge commits! evil! I can just rebase it. can I rebase it? |
On close inspection, this test suite has become a little weird over the years. When it was introduced and most of it was written, approve_testing did somewhat more than it does now. It handled autokarma push (which is now handled by Update.check_karma_thresholds), and it carried much more of the logic for deciding if updates met the relevant requirements or not (most of which is now handled by Update.meets_testing_requirements; all that's left in approve_testing is the check of whether the update meets the autopush-for-time-in-testing threshold). A lot of the tests are really checking those things - all the minute variations of "is the update critical path or not?", "does the update have autokarma enabled?", various time spent in testing and karma levels are really about testing that stuff. It doesn't make sense to test that here any more. Those things should be tested elsewhere, mainly in the tests of meets_testing_requirements. At least one of these tests - test_autotime_update_with_autokarma_met_karma_and_time_requirements_get_pushed - actually still passes if you completely remove the call to approve_testing_main()! This completely rewrites the test suite to focus on testing what approve_testing actually does now. To do this I evaluated every existing test, and also evaluated the current code of approve_ testing to try and make sure I was testing every path through it. I'll include those notes in the pull request as they're long. Some of the existing tests that are still useful are kept, but tweaked slightly to take advantage of the new setup_method and assertion helpers, to reduce duplication. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This makes these tests simpler, clearer and more comprehensive by using parametrize(), adding assertions and comments about significant assumptions made by the tests, and reorganizing and renaming the tests into sensible groups. Signed-off-by: Adam Williamson <awilliam@redhat.com>
The existing tests that ultimately test the stuff handled by check_karma_thresholds are spread around, both within individual test files and across them. There's kind of a grab bag of tests of different paths that have been added a bit haphazardly over the years. This tries to reconcile them all into a set of comprehensive and consistent tests that all live in one place. I did quite a lot of checking to try and make sure everything that was currently tested still is, and that some things which weren't currently tested are now. This is a precursor to changing some of the behaviour of check_karma_thresholds in later commits. Signed-off-by: Adam Williamson <awilliam@redhat.com>
`obsolete_if_unstable` was introduced by fedora-infra#901 to cover one path: obsoleting a 'pending' update if it reached its negative karma threshold. At the time, `check_karma_thresholds` only reached its negative karma check for updates in 'testing' state, it did not run on updates in 'pending' state. However, fedora-infra#1556 extended `check_karma_thresholds` to also run on updates in 'pending' state, which actually rendered `obsolete_if_unstable` entirely redundant, except in one case - calling `comment()` with `check_karma` set to `False` - which was introduced solely to make it possible to test `obsolete_if_unstable`. Signed-off-by: Adam Williamson <awilliam@redhat.com>
Before fedora-infra#901, we obsoleted updates that reached their unstable_karma threshold regardless of autokarma setting, but only for updates in 'testing' status, not 'pending' status. fedora-infra#901 intended to fix that. In the review, @bowlofeggs said "we should only do this [obsolete updates in pending state that reach their unstable_karma threshold] if autokarma is turned on", but he did not apparently realize that was inconsistent with the existing behavior in the testing state. So after fedora-infra#901, we would obsolete updates from pending only if autokarma was enabled, but we would always obsolete updates from testing regardless of the autokarma setting. This behaviour remained the same afer fedora-infra#1556. I don't think it makes any sense to preserve this inconsistent behavior. This PR comes down on the side of always obsoleting regardless of the setting of autokarma, as I think in practice this happens most often to updates in 'testing' state not updates in 'pending' state, so it's the most similar behaviour. It's also the older behaviour. It would be easy to flip it to only ever unpush if autokarma is enabled, if we decide that's better. Signed-off-by: Adam Williamson <awilliam@redhat.com>
Only disabling autopush for updates in 'testing' status makes no sense. @trishnaguha explained in fedora-infra#1556 (comment) that their rationale was that updates in 'pending' status should get obsoleted (instead?), but this...just isn't right. We only obsolete updates when they reach the negative karma threshold, which is usually much stricter than just a single -1 (by default it's a karma *total* of -3). We want to disable autopush for any update as soon as it gets a single -1, regardless of whether it's in pending or testing status, and regardless of whether we might later obsolete it. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This has been hanging around for a while, but it's fairly useless. At some point this is where we were planning to post the testing_approval_msg message if the update reached the critpath.min_karma threshold, but that is actually done by the approve_testing.py script nowadays. All we do here is log a message, and it's actually a bogus message anyway, because stable_karma is *not* the manual push threshold. stable_karma is the autopush threshold; it's really meaningless if autokarma is not set. So it doesn't make any sense to log this message when we hit stable_karma but autopush is not set. We should just do nothing on that path. Note this slightly changes when we set date_approved, but the current behaviour around date_approved makes no sense anyway, so the next commit revamps it entirely to make much more sense. Signed-off-by: Adam Williamson <awilliam@redhat.com>
This was a logical consequence of the previous change to drop a branch in check_karma_thresholds. That change affected when date_approved got set, and investigating that, I realized it is not set at all consistently. With this, we consistently set date_approved to be the first time - now always as a `datetime.datetime` instance - an update becomes eligible for manual push. Previously it was sometimes set then, sometimes set when an update was autopushed, and sometimes never set at all. We also drop an early bailout in approve_testing.py that is no longer ever hit in the real world. It initially bailed any time the release had no "mandatory days in testing", which would have been true for Rawhide (and early Branched). But when autotime was added, it was changed to bail only if the release had no mandatory days in testing *and* the update did not have autotime set. Updates for Rawhide and other releases like it (ones not "composed by Bodhi") are always forced to have autotime enabled, and all releases that *are* "composed by Bodhi" have mandatory days in testing, so we just never really satisfy the conditions any more. I verified this by asking nirik to check Bodhi's logs for the message that should be logged when this bailout is hit; there were no occurrences of it. Anyway, I don't think an early bailout is ever really justifiable any more, because we have gating checks for all branches of Fedora these days. This refactors approve_testing a bit just to make the different paths through it clearer, and separate different work into different functions. We also add comments to make it clearer how the different autopush paths actually *work*, which was not at all clear to me before researching this commit. Signed-off-by: Adam Williamson <awilliam@redhat.com>
There are several problems in the general area of "checking if an update 'meets requirements'". Most obviously, as discussed in the comments on fedora-infra#3938 , there are two similar-but-different functions for this, `Update.meets_testing_requirements()` and `Update.check_requirements()`. `Update.set_request()` also sort of re-implements the same thing in-line, and there is also `Update.critpath_approved()` which does something similar. These paths all implement somewhat different requirements, which has various weird and confusing results (like the way you can sometimes push an update stable from the CLI but not from the web UI, or the way sometimes updates can be submitted stable successfully but then get ejected from the stable push for not meeting requirements). This commit introduces `meets_requirements_why`, which returns a boolean and a string explanation, and turns the existing methods into thin wrappers around it. The block in the API `set_request()` function which checked requirements is dropped because it's now redundant; the DB model `set_request()` which it calls does the same requirement check. There's no need to do it twice. The code block commented "Disable pushing critical path updates for pending releases directly to stable" does not really do that. It just applied the requirements for critical path updates on push to stable (the block immediately under it did the same for non-critpath updates). The boutique requirements to do with a "critpath.num_admin_approvals" setting were for a very old policy which gave special status to feedback from "proven testers"; that setting has been 0 in both Bodhi deployments for years and is not likely to come back, so we can just get rid of the setting and simplify the code path. This commit clarifies and simplifies the meanings of the two types of stable push threshold settings, and renames some of them. The `stable_karma` and `stable_days` properties of each update are now *only* relevant to autopush. They are never used in "can this update be pushed manually?" calculations. The `mandatory_days_in_testing` and `min_karma` settings act *only* as policy minimums for manual push eligibility. They can still be set with prefix modifiers for the release and the 'status', as before (to allow Fedora to apply different policies to different releases at different points in the cycle). If an update reaches either currently-applicable policy minimum, it is eligible for stable push. If it does not, it is not. This results in clearer logic and more consistent requirements that are more in line with the distribution policies. critpath.stable_after_days_without_negative_karma is renamed to critpath.mandatory_days_in_testing and now simply sets a days-in-testing threshold that applies to critical path updates only. The "without_negative_karma" logic is removed, because it has no basis in the Fedora or EPEL update policies; they never specified such a rule. critpath.min_karma is renamed to min_karma and applied to all updates, because this is what the current Fedora and EPEL update policies say. They both apply the same minimum karma thresholds to *all* updates. We *could* instead add min_karma and keep critpath.min_karma, but this is somewhat more work and I don't want to do more work than is actually necessary here. If the distros ever do decide to split the thresholds again, we can re-introduce critpath.min_karma. The functions and properties in the models are renamed in line with the new names of the settings, again for sanity's sake. This is of course an API change and a backwards-incompatible configuration change. As there are only two real deployments of Bodhi (both run out of the same ansible repo) and I don't think anything uses the code interfaces, this seems like it shouldn't be too much of a problem. Signed-off-by: Adam Williamson <awilliam@redhat.com>
d27ce5f
to
2b8d29f
Compare
i rebased it. :P |
Sure, it's just the "update branch" button in the webUI is just too comfy to ignore 😉 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks fine to me. I'd just like a note to be dropped in the news directory as backward incompatible change to remind there are some settings in the config file to be renamed upon update.
Thanks. Yeah, I made a little todo list up there somewhere which has adding news fragments in it, I will try to get to that ASAP. |
FWIW, I just took a look at adding a mechanism to abort if any of the removed config settings are set, to make sure we don't deploy any new release with these changes without updating the configs. But unfortunately, our real use of these values uses the prefixing mechanism - we set e.g. |
The "obsolete on unstable_karma" branch being `elif`ed with the "disable autokarma on any negative karma" branch means that, if we reach the unstable_karma threshold when autokarma is enabled, we don't obsolete the update. That's clearly wrong. The most likely way to hit this is for unstable_karma to be -1, so the first negative karma causes both conditions to be met at once. But it's also possible if e.g. unstable_karma is -2, the update receives one negative karma, the maintainer re-enables autokarma, then the update receives another negative karma. We should *always* obsolete an update when it reaches unstable_karma, regardless of anything else we do. If this means we both obsolete it and disable autokarma at the same time, I don't see any problem with that, it seems correct. Signed-off-by: Adam Williamson <awilliam@redhat.com>
Pushed one more change to always apply the obsolete-on-unstable_karma check in When the request is But this is awkward for the test code, because it means the UpdateKarmaThresholdV1 message is generated using a state of the update it simply doesn't have access to. Before it sends the negative karma comment, the update state doesn't have the negative karma comment text in it (obviously), it isn't obsoleted, and its autokarma status is the original one. After it sends the negative karma comment, the negative karma comment text is present and the update has been obsoleted (which is the state in which the UKT message is generated), but also, an additional comment has been posted that wasn't present when the UKT message was generated, and the autokarma status may have changed. The test code never has access to the update state after the comment is added and the obsoletion changes occur but before the autokarma disablement changes happen, which is the state the update is in when the UKT message is generated. So it's just not cleanly possible for the test code to predict the UKT message content. So on this path we just have to assert that some kind of UKT message is published. |
This covers all the changes from the commits together. It's rather long, but as short as I could make it... Signed-off-by: Adam Williamson <awilliam@redhat.com>
OK, added a news fragment, so I think this is ready to go. Sorry, I really can't boil these changes down to one line! I think towncrier should do an OK job of parsing a long fragment, from the test run I did. I fiddled around with tweaking the 'new update' HTML template, but it seems like making meaningful change there would be rather harder than I thought. We don't actually know the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll soon deploy everything in staging, so we can do some testing in a "real" environment...
sounds good. let's hope I didn't blow anything up too badly. :D |
A new build made from master branch has been made in koji staging. I've submitted https://pagure.io/fedora-infra/ansible/pull-request/1974 with the necessary config changes. When that PR is merged I will redeploy staging bodhi. |
So, I've pushed the latest code to staging. I see that pretty much all updates in "testing" status have a comment "This update's test gating status has been changed to 'failed'." even if "no tests are required". |
If you look in the right column, they're all showing "1 errors handling remote rules", which indicates a problematic response from greenwave. If you use browser tools to examine the exact response from greenwave, you'll see something like Do you remember what happened before the new release, by any chance? |
This covers all the changes from the commits together. It's rather long, but as short as I could make it... Signed-off-by: Adam Williamson <awilliam@redhat.com> (cherry picked from commit 7fd02ab)
This covers all the changes from the commits together. It's rather long, but as short as I could make it... Signed-off-by: Adam Williamson <awilliam@redhat.com> (cherry picked from commit 7fd02ab)
This is a huge patch series that more or less entirely overhauls Bodhi's approach to checking if updates "meet requirements" in various senses. See each individual commit's extensive message for details, but broadly the goal is to make the code (and tests) more consistent, correct, in line with the actual Fedora update policies, and simpler.