Factories based on python decorations of the trees #151

pieterdavid · 2018-03-06T08:25:16Z

I am sorry, this one became quite big... it is what I ended up doing to generate plotters etc, replacing the string-formatting / template engine in the current version by python objects and modules.
I am pushing this mostly because @clacaputo asked about it, and because other people may also be interested in using this at some point - so there is no need to review and merge this quickly, we can first discuss a bit.

There is a fair amount of boilerplate code (some of it could be simplified, and some partially overlaps with existing modules), but it does bring some advantages:

the meaning of references to other objects (the many branches containing indices) only need to be correctly encoded in the tree description, and not worried about any more after (this was the original motivation to start this)
the resulting plotter is smaller (C++ code and binary), in principle more CPU efficient, and it compiles faster (because there are nested "if" blocks and some expressions are calculated once and stored, so the optimizing algorithms don't get lost in a lot of similar-looking things)
it is quite straightforward to add a plot, change selections etc. (admitted, this is subjective, but I think it is more flexible once you get used to the way of doing things)
cut and variable expressions consist of python objects, so histograms for data-driven backgrounds, systematic variations etc. can usually be generated from those for data automatically
scale factors can be added at the plotter stage (with surprisingly little code) - this may not need all of these changes, but it is an important advantage

If you want to have a look (or start reviewing), I would suggest to start with an example for H->ZA.
In practice one would split this over a few files, but I made this one self-contained to show what is needed to describe the tree structure and generate a plotter for a set of plots (running it with only the -i option will give an IPython shell to play around with a 'decorated' tree, it's in the tup variable and you can get the C++ code for an expression with toDraw, e.g. toDraw(tup.hZA.lljj_deepCSV[0].J1.p4.Pt()) gives 'jet_p4[hZA_jets[hZA_lljj_deepCSV[0].ijet1].idx].Pt()'. I also added a first version of such a script for nanoAOD.
The decorations are meant to be straightforward to use: they consist of placeholder or "stub" objects representing (groups of) branches, which support a number of operations (getting one branch, resolving a reference, arithmetic) such that expression trees can be constructed from them (a 1D histogram needs three expressions: a selection, a weight, and an x-axis variable, as before).

Any thoughts/feedback/criticism/suggestions (also on what to document better) welcome :-)

…OD demo

…solutions)

and complete the example using those

swertz · 2018-03-06T16:38:52Z

This is huge, I think I'll need more than an example to work through it :D

pieterdavid · 2018-03-07T09:06:24Z

I understand... don't feel too obliged to have a look soon (and feel free to make suggestions for more examples or documentation that would help).

The code is (meant to be) reasonably modular, so I can try to suggest an order to go through (or at least structure a bit).

treedecorators.py is the general code for decorating the trees. I forgot to mention last time: my trees are "flatter" (I only have basic types, STL containers of them, a Lorentz vector type and a GenParticle type in my trees), so the decorators are also meant to make that transparent.
lldeco.py contains some useful helpers for decorating typical ll(bb) trees (mostly for dealing with a fact that a "lepton" can refer to an electron or a muon and you need to get e.g. the momentum from different branches based on that) - this is in this case only needed for the H->ZA example, but I do the same for ttW
nanoaddeco.py and scripts/nanoAODInteractive.py are another example of the decorations
plots.py defines a Plot class (as before), and a Selection class, which contains cuts, weights, a "candidate" reference and a list of systematic variations (with a helper method to add cuts and weights etc.).
This is the main API the user needs for doing an analysis, i.e. writing selections and making plots (describing the tree structure needs to be done only once per analysis, roughly - and I hope to provide examples of all common cases for that).
histfactory.py (with its skimmer equivalent treefactory.py, and the code they share in factoryhelpers.py) provides everything to go from a list of Plot objects to histogram: code to generate, compile and run plotters and skimmers. The code that orders the cuts / evaluations / fill-statements is in hist_opt_2v2.py (there is room for more optimization here)
for running the plotters/skimmers on slurm or condor, the code in slurmhelpers.py, condorhelpers.py and batchhelpers.py is used (similar to the ones that existed before, but I tried to make them more general, and use the CP3SlurmUtils)
plotithelpers writes a YAML file with the list of plots, for plotIt
scalefactors.py is basically a dictionary of scale factors for 2016 data. On the C++ side, ScaleFactors.h is used to interface with the BinnedValuesJSONParser etc. classes in the framework (and to weight/sample as needed if they are different for different run periods)
IndexRangeIterator.h is a (trivial) iterator class that dereferences to the index of the element (since often the index is what is needed to retrieve something from other branches, but STL algorithms are an elegant way to decide which one, and they need iterators, this is quite helpful)
kinematics.h: deltaPhi, deltaEta, deltaR, and signed versions of the former two
everything in templates: more or less as before (with some differences in the CMakeLists etc., and without JSON parsing in the plotter/skimmer: here they just take the list of input files and, and they do not change the normalisation)

it was in .gitignore so far... also modified to default to using the files, so this should work with only PyROOT (with one warning on the first import)

pieterdavid added 11 commits January 26, 2018 14:44

Copy plots (basic classes) and treedecorators, as a first step

e927a88

a few fixes for vector<map<string,bool>> (electron IDs in ZA)

93ee6c7

Add helper decorators for leptons in Framework-style trees, and nanoA…

9152535

…OD demo

Import templates and python modules from ttWTools (and change path re…

f7f06b6

…solutions)

Example H->ZA script, part 1 (interactive)

3842852

small fixes after moving code around

1527801

add IndexRangeIterator (C++ helper)

5d2dd6e

Include scalefactors and plotit helpers

4a7976f

and complete the example using those

Whitespace cleanup

5ea9010

Add lepton and jet plots as well

f42e432

Add JEC and JER systematic variations

b150fc1

pieterdavid added bug Do NOT merge and removed bug labels Mar 6, 2018

ZA types: compile the library on demand (by including the header)

c07e277

pieterdavid and others added 2 commits March 9, 2018 17:24

Add python/__init__.py with some paths

bf5ccdd

it was in .gitignore so far... also modified to default to using the files, so this should work with only PyROOT (with one warning on the first import)

Add script for standalone (only pyroot) install

bf90505

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factories based on python decorations of the trees #151

Factories based on python decorations of the trees #151

pieterdavid commented Mar 6, 2018

swertz commented Mar 6, 2018

pieterdavid commented Mar 7, 2018

Factories based on python decorations of the trees #151

Are you sure you want to change the base?

Factories based on python decorations of the trees #151

Conversation

pieterdavid commented Mar 6, 2018

swertz commented Mar 6, 2018

pieterdavid commented Mar 7, 2018