Skip to content

justinrpilot/BESTAnalysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

64 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Justin Pilot
17 Nov 2017 (D Marley)

BESTAnalysis

To check out:

cmsrel CMSSW_8_0_12
cd CMSSW_8_0_12/src
cmsenv
git clone https://github.com/justinrpilot/BESTAnalysis BESTAnalysis
scram b 

To get started:

cd BESTAnalysis/BESTProducer/test
cmsRun run_HH.py

Training

BESTProducer.cc -- computes the event shape quantities in the boosted frame, adds them as EDM products to the event, and produces a flat TTree with the quantities for NN training.
To run:

[BESTProducer/test> cmsRun run_TT.py

Several files are available (run_TT.py, run_WW.py, etc.) and can be configured according to the sample of interest with the following options:

process.run = cms.EDProducer('BESTProducer',
	pdgIDforMatch = cms.int32(6),  → the particle used for jet matching
	NNtargetX = cms.int32(1),  → not used
	NNtargetY = cms.int32(1),  → not used anymore
	isMC = cms.int32(1),   → if the input is MC (to use data eventually)
	doMatch = cms.int32(0) → if the matching requirement with pdgID is actually applied
)

To run on crab, use crab*.py, where there are different decay modes and resonance masses.

Once you have the training files, you can train the NN. You can find some of my most recentfiles in /store/user/pilot/BESTFiles/Aug8/

Before training, some processing needs to be done, to compute additional variables and add the targets.
The script is runTree.C, which can also be used to evaluate new NNs on existing samples on-the-fly:

[BESTProducer/test> root -l
[0] .L runTree.C
[1] runTree(string inFile, string outFile, string histName, float targX, float targY, float targ3, float targ4, float targ5, bool reduce = 1)

The options are as follows

File Description
inFile the inputFile to run over
outFile the outputFile name
histName the weight file used to flatten the pT distribution (see below)
targets the values of the desired outputs for each node
reduce reduce the total number of events to a desired number for training/testing

I usually use run_all.C to run these commands sequentially for all samples.
Flattening the et: Use the script flatten.C

root -l
[0] .L flatten.C
[1] flatten(TString infile, TString outWeightFile, float etMin = 500, float etMax = 4000)

It will spit out a weight file that you use with runTree.C as shown above.
You can now run the training:

[BESTProducer/TMVATraining > root -l TMVARegression.C  

Currently, this takes a long time! Some important pieces of this file:

Piece Description
factory->AddVariable( "h1_top", "h1_top", "", 'F' ); Adding the variables for training -- they must match the branch names in the trees
factory->AddTarget( "targetX" ); The regression "targets" (output nodes) and desired outputs -- must have all 5 and the names must match the branch names again
TChain *input = new TChain("jetTree"); The input tree and file names (everything is added to one chain)
factory->SetWeightExpression( "flatten", "Regression" ); Apply the weights to flatten et
TCut mycut = "et > 500 && et < 3000" A cut to apply in creating testing/training samples

Additionally, define the training and testing sizes with factory->PrepareTrainingAndTestTree( mycut, "nTrain_Regression=100000:nTest_Regression=100000:SplitMode=Random:NormMode=NumEvents:!V" );.

You can reduce the number of training events and input variables to train a small NN quickly to see if things are working.
The network is defined by an xml file in TMVATraining/weights/
This xml file needs to be used with runTree.C to evaluate the NN performance!

Once you have a NN xml file, you can re-run runTree.C to look at the output distributions in a nice way, make sure you have the correct xml file in there: see here.

You must have the variables defined in the same order as you did in TMVARegression.C for the TMVA::Reader class. And make sure BookMVA() has the new file name of the xml file from training.

Plotting

Some plotting scripts that are useful:

Script Description
draw.C compares W/Z/t/H/j samples for individual variables (DoDraw.C for batch plot creation)
draw2D.C makes the cool 2D plot and lego plots for the individual samples

Standalone

To use BEST inside of another framework, access the class in the BoostedEventShapeTagger directory.
This requires setting up the lwtnn framework, which provides an interface for machine learning frameworks, developed using python libraries, in a c++ environment (initialize the lwtnn object, generate inputs to the network, and obtain the output).

The BEST model was developed using scikit-learn.
The script BoostedEventShapeTagger/python/sklearn2json.py converts the scikit-learn model into a format appropriate for lwtnn.

The class returns a std::map<std::string,double> containing the NN values for the different predictions:

  • "dnn_qcd"
  • "dnn_top"
  • "dnn_higgs"
  • "dnn_z"
  • "dnn_w"

To implement, checkout & build the framework(after setting up your CMSSW environment!).
First, checkout lwtnn following the instructions here.
Then:

git clone https://github.com/cms-ttbarAC/BESTAnalysis.git 
cd BESTAnalysis
git checkout sklearn-lwtnn-interface
cd BoostedEventShapeTagger
scram b -j8

In your class, simply provide the following changes:

  • Header file
#include "BESTAnalysis/BoostedEventShapeTagger/interface/BoostedEventShapeTagger.h"
// private:
BoostedEventShapeTagger *m_BEST;
  • Source file
// constructor (pass the name of the configuration file)
m_BEST = new BoostedEventShapeTagger( "BESTAnalysis/BoostedEventShapeTagger/data/config.txt" );

// destructor
delete m_BEST;

// event loop
std::map<std::string,double> NNresults = m_BEST->execute(ijet);  // ijet is a pat::Jet
int particleType = m_BEST->getParticleID();                      // automatically calculate the particle classification

The file config.txt is a configuration file that sets parameters for BEST.
An example configuration file is provided in BESTAnalysis/BoostedEventShapeTagger/data/config.txt.

  • BuildFile
<use   name="BESTAnalysis/BoostedEventShapeTagger"/>

And re-compile your code using scram.
The outputs, NNresults and particleType can then be saved to your output file, or used in some other way.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published