Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring gen weight storage in EDM + Nano integration #32167

Closed
wants to merge 25 commits into from

Conversation

kdlong
Copy link
Contributor

@kdlong kdlong commented Nov 17, 2020

PR description:

This PR introduces new weight products to hold the Meta Info of the weights, and the weights themselves, organized into groups of a given type (Scale, Pdf, MEParam, etc.)

This has been presented several times in GEN and X-POG, such as https://indico.cern.ch/event/944653/#5-status-of-new-data-structure

The structure of the products is in place. We are still finalizing the NanoAOD integration and testing for parsing corner cases. We expect these tests to be wrapped up in a couple of weeks, but because of the scale of the changes, it would be useful to have feedback already (I'm sure there will be lost of coding suggestions).

This builds on the work of several people, including @dteague @sroychow @guitargeek @cericeci

@agrohsje and @kpedro88 have contributed to the design and review.

PR validation:

We have validated MiniAOD --> NanoAOD, adding products to the event, and GEN-->NanoAOD instantiating products to files. A large testing campaign will take place before NanoAOD v9 and we will report the status here.

NanoAOD and MiniAOD file type size changes

The size of the Nano will in general not be changed, by default the same number of PDF and scale weights are stored. The main change is to make the selection of weights to be stored much more convenient, significantly reduce the number of samples where the weights are not found and not stored (makes those individual samples larger but not a big impact on the average), and provides features for keeping more weights in private production.

The change in size of the MiniAOD will depend on the strategy we adopt, which I'd like a bit more feedback on. If we duplicate the event weight information, it is a 10% increase. I assume this is not under consideration. If we zero the weights in the existing product (produce a new LHEEventProduct with only the central weight for example), the change will be negligible. If we only ever produce the product on the fly the size change will be 0.

Perhaps a good option would be to keep the duplicate product at the gen data tier, and zero it out at the MiniAOD step. This requires fixing code that depends on the weights in the LHEEventProduct (Rivet, for example, https://github.com/cms-sw/cmssw/blob/master/GeneratorInterface/RivetInterface/plugins/RivetAnalyzer.cc, but the change is straightforward) and in updating the name of the LHEEventProduct anywhere it is read if it is produced by a new producer (copy previous with zeroed weights). I guess this could be tedious.

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32167/19876

ERROR: Build errors found during clang-tidy run.

  ^~~~~~~
  kCounter
--
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1234:10: error: member reference base type 'EDataType' is not a structure or union [clang-diagnostic-error]
  counter.incGenOnly(genWeight);
         ^
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1236:30: error: use of undeclared identifier 'streamCache' [clang-diagnostic-error]
  std::string& model_label = streamCache(id)->getLabel();
                             ^
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1247:21: error: use of undeclared identifier 'genWeightToken_' [clang-diagnostic-error]
  iEvent.getByToken(genWeightToken_, genWeightHandle);
                    ^
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1249:20: warning: the variable 'genWeights' is copy-constructed from a const reference but is only used as const reference; consider making it a const reference [performance-unnecessary-copy-initialization]
--
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1251:30: error: use of undeclared identifier 'luminosityBlockCache' [clang-diagnostic-error]
  auto const& weightInfos = *luminosityBlockCache(iEvent.getLuminosityBlock().index());
                             ^
PhysicsTools/NanoAOD/plugins/GenWeightsTableProducer.cc:1257:19: error: use of undeclared identifier 'weightTypeNames_' [clang-diagnostic-error]
  for (auto& wg : weightTypeNames_) {
                  ^
SimDataFormats/GeneratorProducts/interface/LHEEventProduct.h:30:38: warning: use std::make_unique instead [modernize-make-unique]
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:128: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32167/19879

ERROR: Build errors found during clang-tidy run.

Suppressed 1252 warnings (1251 in non-user code, 1 with check filters).
--
SimDataFormats/GeneratorProducts/interface/LHEEventProduct.h:30:45: error: no member named 'makemake_unique' in namespace 'std' [clang-diagnostic-error]
  void setPDF(const PDF &pdf) { pdf_ = std::makemake_unique > (pdf); }
                                            ^
Suppressed 1439 warnings (1438 in non-user code, 1 with check filters).
--
SimDataFormats/GeneratorProducts/interface/LHEEventProduct.h:30:45: error: no member named 'makemake_unique' in namespace 'std' [clang-diagnostic-error]
  void setPDF(const PDF &pdf) { pdf_ = std::makemake_unique > (pdf); }
                                            ^
Suppressed 1601 warnings (1600 in non-user code, 1 with check filters).
--
SimDataFormats/GeneratorProducts/interface/LHEEventProduct.h:30:45: error: no member named 'makemake_unique' in namespace 'std' [clang-diagnostic-error]
  void setPDF(const PDF &pdf) { pdf_ = std::makemake_unique > (pdf); }
                                            ^
Suppressed 1744 warnings (1742 in non-user code, 2 with check filters).
--
SimDataFormats/GeneratorProducts/interface/LHEEventProduct.h:30:45: error: no member named 'makemake_unique' in namespace 'std' [clang-diagnostic-error]
  void setPDF(const PDF &pdf) { pdf_ = std::makemake_unique > (pdf); }
                                            ^
Suppressed 1849 warnings (1847 in non-user code, 2 with check filters).
--
gmake: *** [config/SCRAM/GMake/Makefile.coderules:128: code-checks] Error 2
gmake: *** [There are compilation/build errors. Please see the detail log above.] Error 2

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-32167/19884

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @kdlong (Kenneth Long) for master.

It involves the following packages:

DataFormats/NanoAOD
GeneratorInterface/Core
GeneratorInterface/LHEInterface
PhysicsTools/NanoAOD
SimDataFormats/GeneratorProducts

@SiewYan, @perrotta, @gouskos, @mkirsano, @cmsbuild, @jpata, @fgolf, @slava77, @alberto-sanchez, @agrohsje, @mariadalfonso, @santocch, @GurpreetSinghChahal can you please review it and eventually sign? Thanks.
@mkirsano, @swertz, @rovere, @peruzzim, @apsallid, @lecriste, @alberto-sanchez, @gpetruc, @fabiocos this is something you requested to watch as well.
@silviodonato, @dpiparo, @qliphy you are the release manager for this.

cms-bot commands are listed here

@kpedro88
Copy link
Contributor

please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Nov 18, 2020

The tests are being triggered in jenkins.

@cmsbuild
Copy link
Contributor

-1

Tested at: 45323bb

CMSSW: CMSSW_11_2_X_2020-11-17-2300
SCRAM_ARCH: slc7_amd64_gcc820
You can see the results of the tests here:
https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-3df97d/10838/summary.html

I found follow errors while testing this PR

Failed tests: HeaderConsistency ClangBuild

  • Clang:

I found compilation warning while trying to compile with clang. Command used:

USER_CUDA_FLAGS='--expt-relaxed-constexpr' USER_CXXFLAGS='-Wno-register -fsyntax-only' scram build -k -j 32 COMPILER='llvm compile'

See details on the summary page.

@cmsbuild
Copy link
Contributor

Comparison job queued.

@smuzaffar smuzaffar added this to the CMSSW_13_2_X milestone May 4, 2023
Copy link
Contributor

@kpedro88 kpedro88 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kdlong, I was looking at this again for private use (since the gen weight situation in UL is even messier than pre-L...) and noticed a few bugs and typos. Just leaving these notes here for whenever you have time to get back to it.

{ErrorType::SwapHeader, "Header info out of order"},
{ErrorType::HTMLStyle, "Header is invalid HTML"},
{ErrorType::TrailingStr, "Header has extraneous info"},
{ErrorType::Unknown, "Unregonized error"},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Unregonized -> Unrecognized

if (debug_)
std::cout << "Weight type is unknown\n";

std::cout << "Group name is " << weight.groupname << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this be behind the debug_ flag?

std::string labels = "PS weights (w_var / w_nominal); ";
if (pswV.isWellFormed()) {
double baseline = psWeights.at(pswV.weightIndexFromLabel("Baseline"));
psTosave.emplace_back(psWeights.at(pswV.variationIndex(true, false, gen::PSVarType::def)) / baseline);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this line and the next variationIndex() call are incorrect according to

int isrCombinedDownIndex(PSVarType variationType) const { return variationIndex(true, false, variationType); }
(true, false is isrCombinedDownIndex() and false, false is fsrCombinedDownIndex()). instead, isr/fsrCombinedUpIndex() should be used on these lines.

desc.addUntracked<bool>("saveProvenance", true)
->setComment("Save process provenance information, e.g. for edmProvDump");
desc.addUntracked<bool>("fakeNameForCrab", false)
->setComment(
"Change the OutputModule name in the fwk job report to fake PoolOutputModule. This is needed to run on cran "
"Change the OutputModule name in the fwk job report to fake "
"PoolOutputModule. This is needed to run on cran "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: cran -> crab

void WeightHelper::cleanupOrphanCentralWeight(WeightGroupInfoContainer& weightGroups) const {
auto centralIt = std::find_if(std::begin(weightGroups), std::end(weightGroups), [](auto& entry) {
return entry->weightType() == gen::WeightType::kScaleWeights &&
static_cast<ScaleWeightGroupInfo*>(entry.get())->containsCentralWeight();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the logic here is incorrect: this should select for !containsCentralWeight(), in order to find the scale weight group that is missing the central weight (because it was orphaned). (I had to change this when testing privately, or the resulting scale weight group did not qualify as well formed.)

int close = -1;
for (size_t idx = 0; idx < headerLines.size(); idx++) {
std::string& line = headerLines[idx];
std::cout << "Line is " << line << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be behind debug_ flag

@clacaputo
Copy link
Contributor

-reconstruction

cleaning reco queue

@smuzaffar
Copy link
Contributor

Milestone for this pull request has been moved to CMSSW_14_0_X.Please open a backport if it should also go in to CMSSW_13_3_X.

@vlimant
Copy link
Contributor

vlimant commented Dec 1, 2023

please close
this is definitely stalled. please reopen when necessary

@cmsbuild cmsbuild closed this Dec 1, 2023
@@ -299,41 +327,42 @@ void NanoAODOutputModule::openFile(edm::FileBlock const&) {
m_file->SetCompressionAlgorithm(ROOT::kZLIB);
} else if (m_compressionAlgorithm == std::string("LZMA")) {
m_file->SetCompressionAlgorithm(ROOT::kLZMA);
} else if (m_compressionAlgorithm == std::string("ZSTD")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this an artifact of some merge history issue or do we want to remove these compression algos?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That must be a merge mistake, thanks for spotting!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.