Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Factories based on python decorations of the trees #151

Open
wants to merge 14 commits into
base: master
Choose a base branch
from

Conversation

pieterdavid
Copy link
Member

I am sorry, this one became quite big... it is what I ended up doing to generate plotters etc, replacing the string-formatting / template engine in the current version by python objects and modules.
I am pushing this mostly because @clacaputo asked about it, and because other people may also be interested in using this at some point - so there is no need to review and merge this quickly, we can first discuss a bit.

There is a fair amount of boilerplate code (some of it could be simplified, and some partially overlaps with existing modules), but it does bring some advantages:

  • the meaning of references to other objects (the many branches containing indices) only need to be correctly encoded in the tree description, and not worried about any more after (this was the original motivation to start this)
  • the resulting plotter is smaller (C++ code and binary), in principle more CPU efficient, and it compiles faster (because there are nested "if" blocks and some expressions are calculated once and stored, so the optimizing algorithms don't get lost in a lot of similar-looking things)
  • it is quite straightforward to add a plot, change selections etc. (admitted, this is subjective, but I think it is more flexible once you get used to the way of doing things)
  • cut and variable expressions consist of python objects, so histograms for data-driven backgrounds, systematic variations etc. can usually be generated from those for data automatically
  • scale factors can be added at the plotter stage (with surprisingly little code) - this may not need all of these changes, but it is an important advantage

If you want to have a look (or start reviewing), I would suggest to start with an example for H->ZA.
In practice one would split this over a few files, but I made this one self-contained to show what is needed to describe the tree structure and generate a plotter for a set of plots (running it with only the -i option will give an IPython shell to play around with a 'decorated' tree, it's in the tup variable and you can get the C++ code for an expression with toDraw, e.g. toDraw(tup.hZA.lljj_deepCSV[0].J1.p4.Pt()) gives 'jet_p4[hZA_jets[hZA_lljj_deepCSV[0].ijet1].idx].Pt()'. I also added a first version of such a script for nanoAOD.
The decorations are meant to be straightforward to use: they consist of placeholder or "stub" objects representing (groups of) branches, which support a number of operations (getting one branch, resolving a reference, arithmetic) such that expression trees can be constructed from them (a 1D histogram needs three expressions: a selection, a weight, and an x-axis variable, as before).

Any thoughts/feedback/criticism/suggestions (also on what to document better) welcome :-)

@swertz
Copy link
Contributor

swertz commented Mar 6, 2018

This is huge, I think I'll need more than an example to work through it :D

@pieterdavid
Copy link
Member Author

I understand... don't feel too obliged to have a look soon (and feel free to make suggestions for more examples or documentation that would help).

The code is (meant to be) reasonably modular, so I can try to suggest an order to go through (or at least structure a bit).

  • treedecorators.py is the general code for decorating the trees. I forgot to mention last time: my trees are "flatter" (I only have basic types, STL containers of them, a Lorentz vector type and a GenParticle type in my trees), so the decorators are also meant to make that transparent.
  • lldeco.py contains some useful helpers for decorating typical ll(bb) trees (mostly for dealing with a fact that a "lepton" can refer to an electron or a muon and you need to get e.g. the momentum from different branches based on that) - this is in this case only needed for the H->ZA example, but I do the same for ttW
  • nanoaddeco.py and scripts/nanoAODInteractive.py are another example of the decorations
  • plots.py defines a Plot class (as before), and a Selection class, which contains cuts, weights, a "candidate" reference and a list of systematic variations (with a helper method to add cuts and weights etc.).
    This is the main API the user needs for doing an analysis, i.e. writing selections and making plots (describing the tree structure needs to be done only once per analysis, roughly - and I hope to provide examples of all common cases for that).
  • histfactory.py (with its skimmer equivalent treefactory.py, and the code they share in factoryhelpers.py) provides everything to go from a list of Plot objects to histogram: code to generate, compile and run plotters and skimmers. The code that orders the cuts / evaluations / fill-statements is in hist_opt_2v2.py (there is room for more optimization here)
  • for running the plotters/skimmers on slurm or condor, the code in slurmhelpers.py, condorhelpers.py and batchhelpers.py is used (similar to the ones that existed before, but I tried to make them more general, and use the CP3SlurmUtils)
  • plotithelpers writes a YAML file with the list of plots, for plotIt
  • scalefactors.py is basically a dictionary of scale factors for 2016 data. On the C++ side, ScaleFactors.h is used to interface with the BinnedValuesJSONParser etc. classes in the framework (and to weight/sample as needed if they are different for different run periods)
  • IndexRangeIterator.h is a (trivial) iterator class that dereferences to the index of the element (since often the index is what is needed to retrieve something from other branches, but STL algorithms are an elegant way to decide which one, and they need iterators, this is quite helpful)
  • kinematics.h: deltaPhi, deltaEta, deltaR, and signed versions of the former two
  • everything in templates: more or less as before (with some differences in the CMakeLists etc., and without JSON parsing in the plotter/skimmer: here they just take the list of input files and, and they do not change the normalisation)

pieterdavid and others added 2 commits March 9, 2018 17:24
it was in .gitignore so far... also modified to default to using the
files, so this should work with only PyROOT (with one warning on the
first import)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants