-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug fix, start Services in runOrQueueEventSetupForInstanceAsync #39644
Conversation
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-39644/32445
|
please test |
A new Pull Request was created by @wddgit (W. David Dagenhart) for master. It involves the following packages:
@makortel, @smuzaffar, @Dr15Jones can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
type bug-fix |
urgent |
+1 (assuming tests succeed) |
This pull request is fully signed and it will be integrated in one of the next master IBs after it passes the integration tests. This pull request will now be reviewed by the release team before it's merged. @perrotta, @dpiparo, @rappoccio (and backports should be raised in the release meeting by the corresponding L2) |
ping bot |
@smuzaffar @aandvalenzuela I see that tests are stuck in quite several PRs |
Hi @perrotta, We realized this morning about the PR test delays. @iarspider is on shift and he is trying to solve this issue with IT. PRs are stuck because of a failure in cvmfs. Sorry for the inconvenience, |
please test |
Somehow this is triggering one Framework unit test to fail that was not failing before. I do not think it has anything direct to do with this bug fix, which I still think is correct. The unit test is TestFWCoreFrameworkTransitions At first I thought it was just the other seg fault we already saw, but possibly it is not. I am looking at that now also. I suspect this might just be a problem with the unit test. |
@wddgit @makortel we have to close 12_6_0_pre3. |
@perrotta I'd merge the revert PR #39642 for 12_6_0_pre3, and we continue to investigate the problem(s) with calm. We still don't have a clue on the Would it be feasible to revert the #39642 (i.e. add the concurrent run PR again) after the pre3 has been built? The problems seem rare-enough to me to not cause problems for PR testing. Apologies for the inconvenience. |
Reverting is probably a good idea if the deadline is tomorrow morning. The odds this will be resolved by then are small. One possibility is putting this in the DEVEL release first. Are we testing anything in DEVEL now? Does it run full IBs with all the same statistics and tests? |
Thank you Matti! |
+1 Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-b4faef/28060/summary.html Comparison Summary@slava77 comparisons for the following workflows were not done due to missing matrix map:
Summary:
|
Tests look good |
Curious that they all passed. In my local working area, I get a consistent seg fault with the following test (and only that test):
It passes if I add the Tracer. Probably I should not discuss this on this PR anymore. I think it is unrelated to this PR and bugfix. I'm not sure if it is related to the other unrelated segfault. Nothing looks obviously wrong with the unit test itself. |
@rappoccio This fix is needed anyway (#38801 just made the problem apparent). But it is not sufficient fix all the problems #38801 showed. If you indeed revert #38801 (i.e. merge #39642) for pre3, this PR is not strictly necessary (seems we either don't this code path without #38801, or it is extremely unlikely), but it shouldn't hurt either (famous last words, in a sense it could be safer to build pre3 without this PR if you proceed with #39642). (if you would consider keeping #38801 for pre3, then this PR should be merged as well) |
This bug fix PR will not work without #38801. If you want me to fix the bug without #38801 then I would need to revise it. I agree the bug existed previously, but the list of circumstances which have to exist to hit the bug is long and hard to create. I think it literally has never happened before concurrent runs. I think you would need IOVs that change between beginRun and the following beginLumi in an ESSource that does not support concurrent IOVs. That never happens. I would not bother to merge this fix without #38801. |
OK let's just merge the other then, and not do this one, see comment here: |
@perrotta I will do as you suggested and also combine in the second bug fix as soon as it is ready. I'll submit a new PR containing all that. |
PR description:
Bug fix related to #38801. The problem was noticed in the IBs after that pull request was merged. It didn't show up in earlier tests. This should fix the problem with the message "no ServiceRegistry has been set for this thread".
Note that a second problem with the symptom of a seg fault also showed up rarely after that PR was merged. This probably will not fix that. We are still working on a fix for that one.
PR validation:
This is a non-reproducible problem that does not show up every time so I am not 100% sure this will fix it. We should merge it and see what happens in the IBs.