-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unexpected behavior when exception thrown at begin Run #42501
Comments
assign core |
New categories assigned: core @Dr15Jones,@smuzaffar,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks |
A new Issue was created by @Dr15Jones Chris Jones. @Dr15Jones, @perrotta, @dpiparo, @rappoccio, @makortel, @smuzaffar can you please review it and eventually sign/assign? Thanks. cms-bot commands are listed here |
On a cursory thought the option 3 would feel most natural to me. I would not do option 2, as then the Run/Lumi behavior for dependent (module B in the description) modules would be different from Event behavior. |
@wddgit Any thoughts? |
Chris's explanation sounds correct. It is consistent with the current Framework code. I rewrote this part of the code when I implemented concurrent runs, but I think in this case I simply copied the behavior from what existed before. Here is what I have in my notes: "If globalBeginRun fails, then writeRunAsync should not run, streamEndRun should not run, but globalEndRun is attempted!" And I think this referred to the entire transition for all modules, not separately for each module. There is not and never was any mechanism to track whether each particular module successfully completed globalBeginRun. If any module throws an exception in global begin run, there is a single bool for all modules that records that an exception occurred in that transition, not one per module. If there was there was any intent to handle this differently, I don't think it was ever implemented. I don't think this is a situation where there is a bug and the code is not behaving as intended. That said, we could change the behavior. I don't have a strong opinion one way or the other. We could handle this situation in the module or the Framework. If it happens often, it would be easier to handle in the Framework instead of reimplementing in multiple modules. On the other hand, this is the first time I remembering seeing this problem... |
Summarizing a discussion between myself, @Dr15Jones, @wddgit, and @dan131riley
|
A crash in the IB was caused because a module A threw an exception in begin Run and module B depended on the data product from A. Because of the exception, the framework did not run module A for the begin Run. As part of the shutdown from the exception, the framework ran the end Run transition. As part of the transition, it ran the end Run for module B, even though begin Run was never called for that module. The module expected end Run to only be called if begin Run had been called for the module and because that didn't happen, the module had a segmentation fault.
The intent of the framework was to not call an 'end' transition of a module never saw the associated 'begin' transition. This is clearly not happening.
There are three possible behaviors
The text was updated successfully, but these errors were encountered: