Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore if anything else than DQM could break because of not all streams seeing all lumis #703

Closed
Tracked by #687
makortel opened this issue Oct 24, 2023 · 16 comments
Assignees
Labels

Comments

@makortel
Copy link
Collaborator

makortel commented Oct 24, 2023

Find e.g. through git grep all modules that call stream begin/endLumi and see what exactly they do

@makortel
Copy link
Collaborator Author

makortel commented Oct 25, 2023

List of files for edm::stream to be investigated more deeply

  • SimGeneral/MixingModule/plugins/MixingModule.cc (beginLumi, endLumi)
    • Mixing/Base/src/BMixingModule.cc (beginLumi, endLumi)
      • Mixing/Base/src/PileUp.cc (beginLumi, endLumi)
      • Mixing/Base/src/SecondaryEventProvider.cc (beginLumi, endLumi)
    • SimCalorimetry/EcalSimProducers/src/EcalDigiProducer_Ph2.cc (beginLumi)
    • SimCalorimetry/EcalSimProducers/src/EcalDigiProducer.cc (beginLumi)

  • SimGeneral/PreMixingModule/plugins/PreMixingModule.cc (beginLumi, endLumi)
    • SimTracker/SiPhase2Digitizer/plugins/PreMixingPhase2TrackerWorker.cc (beginLumi)
    • SimCalorimetry/EcalSimProducers/plugins/PreMixingEcalWorker.cc (beginLumi)

  • SimGeneral/DataMixingModule/plugins/DataMixingModule.cc (beginLumi, endLumi)

  • RecoVertex/BeamSpotProducer/plugins/BeamSpotAnalyzer.cc (beginLumi, endLumi)
  • L1Trigger/RegionalCaloTrigger/plugins/L1RCTProducer.cc (beginLumi, on a cursory look assumes to see every lumi on every stream)
  • CalibTracker/SiPixelQuality/plugins/SiPixelStatusProducer.cc (beginLumi, endLumi)

@makortel
Copy link
Collaborator Author

makortel commented Nov 17, 2023

List of files for edm::global to investigate more deeply

  • PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc
  • GeneratorInterface/Core/plugins/ExternalGeneratorFilter.cc
  • GeneratorInterface/Core/interface/ConcurrentHadronizerFilter.h
  • GeneratorInterface/Core/interface/ConcurrentGeneratorFilter.h

@wddgit
Copy link
Collaborator

wddgit commented Jan 3, 2024

I'm starting down this list from the top, MixingModule first. At first glance it looks OK, but there is a lot of code to go through...

@wddgit
Copy link
Collaborator

wddgit commented Jan 4, 2024

MixingModule seems to be OK related to streams skipping lumis. It's a lot of code, but I read through it all and couldn't find anything that would be adversely affected.

@wddgit
Copy link
Collaborator

wddgit commented Jan 10, 2024

L1RCTProducer.cc looks like an actual problem. Skipping lumis could change behavior. For example, there is an option to update some things every n lumis (100th in the cfi). So if some streams skipped lumi 400, then the updated quantities could be different on different streams for lumis 400-499 and that could vary from process to process. Based on comments in the code, that part is used only for online DQM. How far should I dig into this? I could try to understand this and redesign. Or we could push this off to the experts. One option might be to make that part of the code an ESProducer itself and let the EventSetup handle when the updating should occur. It's also not clear to me if this code is currently in use or not. It's been several years since there was development other than things like converting the module to a stream type and things like that.

The things above that on the list looked OK. I guess for the moment I will skip that one and continue down the list.

@makortel
Copy link
Collaborator Author

Thanks for the second look. Poking around it seems to me there is a high chance the module is still being used. Looking at the L1RCTProducer.cc again, the purpose of the queryDelayInLS looks confusing enough that I think we should in any case understand the intention of that code. All the input data for those calculations seem to originate from EventSetup, so I have hard time to understand the point (I'm guessing misunderstanding on how EventSetup works). How about opening an issue to ask these questions? (and if they care of changing the code, even better)

@wddgit
Copy link
Collaborator

wddgit commented Jan 10, 2024

OK. I will open an issue on this one.

@wddgit
Copy link
Collaborator

wddgit commented Jan 11, 2024

PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc

I checked this one off as not affected by a stream skipping a lumi, but I'll throw in some things I noticed as I reviewed. I was tempted to modify multiple things. I'd be happy to work on this and clean it up if you think it is worth my time. Maybe it is all insignificant and some of the problems might never occur if the input is as probably intended...

  1. If the GenLumiInfoHeader product is not found in the lumi, the weightChoice part of stream cache does not get updated and seems to use whatever is left from the previous lumi. Only a logWarning is emitted. I'd probably throw an exception.
  2. The weightChoice part of the stream cache and also part of the CounterMap would really be more appropriately defined as LuminosityBlockCache not StreamCache. It is a bit of a waste to duplicate the memory for every stream and duplicate the effort to fill it. The actual counters belong in the StreamCache and that is OK, but the rest doesn't.
  3. Some of the fields only get initialized under certain circumstances. For example, https://cmssdt.cern.ch/dxr/CMSSW/source/PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc#1039. This is dependent on the input GenLumiInfoHeader product. As long as the input product meets that condition, there is no problem. Otherwise, the values depend on the initialization from the previous lumi.
  4. Note that when it falls back to the previous lumi that seems wrong (with or without lumi skipping). It could also give inconsistent results based on whether the previous lumi was skipped by a certain stream... So maybe this is a bit related to streams skipping lumis.
  5. defPSWeightIDs and defPSWeightIDs_alt are just class constants. They don't need to be in any cache. https://cmssdt.cern.ch/dxr/CMSSW/source/PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc#185.
  6. matchPS_alt could probably just be a stack variable. It does not really need to be in a cache either.

@makortel
Copy link
Collaborator Author

Hi @wddgit, I think these issues would be worth of fixing (especially the "silently" using the values of previous lumi sounds something that should be checked if it is intentional). An alternative for throwing exception would be to fill the values with some "defaults", but before doing that, I'd like to have a confirmation something like that would be the desired behavior. So starting with changing the warning to an exception sounds a reasonable starting point to ask the question.

@wddgit
Copy link
Collaborator

wddgit commented Jan 12, 2024

OK. I will start on that now while it is fresh in my mind.

@wddgit
Copy link
Collaborator

wddgit commented Jan 17, 2024

Submitted PR cms-sw/cmssw#43739 for GenWeightTableProducer.

@wddgit
Copy link
Collaborator

wddgit commented Jan 22, 2024

For the record, the L1RCTProducer Issue is cms-sw/cmssw#43697

@wddgit
Copy link
Collaborator

wddgit commented Jan 23, 2024

ConcurrentHadronizerFilter.h and ConcurrentGeneratorFilter.h will both need to be fixed. On the positive side, the issue is almost identical in both files. I implemented this part of the code when I was working on concurrent runs back in 2022. Hopefully, I'll remember enough about what I did to help me fix this...

@wddgit
Copy link
Collaborator

wddgit commented Jan 30, 2024

The fix for L1RCTProducer was merged 1/30/2024, PR cms-sw/cmssw#43779

@wddgit
Copy link
Collaborator

wddgit commented Jan 30, 2024

Submitted PR for ConcurrentHadronizerFilter and ConcurrentGeneratorFilter. See cms-sw/cmssw#43816.

Including this new one, there are 3 PR's waiting to be merged and that should take care of everything on the list.

@makortel
Copy link
Collaborator Author

Thanks David!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants