Skip to content

NWUHEP/ntupleProducer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

The Ntuple Producer code

This is a CMSSW code for creating small ROOT ntuples from CMS data/MC samples. Check out this twiki for more details: CMS/UserCodeNWUntupleProducer

Instructions for Users

  • Set up the environment
  setenv SCRAM_ARCH slc5_amd64_gcc462
  setenv CVSROOT :ext:<cern-user-account>@lxplus.cern.ch:/afs/cern.ch/user/c/cvscmssw/public/CMSSW
  cmsrel CMSSW_5_3_13_patch3
  cd CMSSW_5_3_13_patch3/src
  cmsenv

Replace the <cern-user-account> with your CERN account. Since the new recipe for CVS connection is done through ssh to lxplus you will have to type your CERN password every time when checkout from CVS. It is inconveniet, but we have to live with it for now.

  git cms-addpkg PhysicsTools/PatAlgos
  git cms-merge-topic 1472
  git cms-merge-topic -u TaiSakuma:53X-met-131120-01
  scram b -j 9
  cvs co -r V00-00-13-01 RecoMET/METFilters
  cvs co -r V00-03-23 CommonTools/RecoAlgos
  cvs co -r V01-00-11-01 DPGAnalysis/Skims
  cvs co -r V00-11-17 DPGAnalysis/SiStripTools
  cvs co -r V00-00-08 DataFormats/TrackerCommon
  cvs co -r V01-09-05 RecoLocalTracker/SubCollectionProducers
  scram b -j 9
  • Ecal tools so we can get more photon variables for photon MVA:
  git cms-addpkg RecoEcal/EgammaCoreTools
  git-cms-addpkg CommonTools/ParticleFlow
  cvs co -r V00-00-09 EgammaAnalysis/ElectronTools
  cvs co -r V09-00-01 RecoEgamma/EgammaTools
  cd EgammaAnalysis/ElectronTools/data
  cat download.url | xargs wget
  cd ../../../
  scram b -j 9
  • MVA MET Code (Just for PU Jet ID) Jet PU ID:
  cvs co -r METPU_5_3_X_v4 RecoJets/JetProducers
  cvs up -r HEAD RecoJets/JetProducers/data/
  cvs up -r HEAD RecoJets/JetProducers/python/PileupJetIDCutParams_cfi.py
  cvs up -r HEAD RecoJets/JetProducers/python/PileupJetIDParams_cfi.py
  cvs up -r HEAD RecoJets/JetProducers/python/PileupJetID_cfi.py
  cvs co -r V05-00-16 DataFormats/JetReco
  scram b -j 9
  • Extra code (for boosted Z->ee isolation), (-d option not working since new CVS directory) following boostedZ and heep twikies:
  mkdir TSWilliams
  mkdir SHarper
  cvs co -r V00-02-03 UserCode/TSWilliams/BstdZee/BstdZeeTools
  cvs co -r V00-09-03 UserCode/SHarper/HEEPAnalyzer
  mv UserCode/TSWilliams/BstdZee/BstdZeeTools TSWilliams/.
  mv UserCode/SHarper/HEEPAnalyzer SHarper/.
  rm -r UserCode
  git clone https://github.com/peruzzim/SCFootprintRemoval.git PFIsolation/SuperClusterFootprintRemoval
  cd  PFIsolation/SuperClusterFootprintRemoval
  git checkout V01-06
  cd ../..
  scram b -j 9
  • Now check out the ntuple producer code and then the specific tag/branch of the code that is known to work
 git clone https://github.com/NWUHEP/ntupleProducer NWU/ntupleProducer
 cd NWU/ntupleProducer
 git checkout v9.9
 cd ../..
 scram b -j 9

Once compiled, we are ready to run it

Runnning the code

  cd NWU/ntupleProducer/test
  cmsRun ntupleProducer_cfg.py

it assumes you are running over an MC sample. If you want to run on data, do:

  cmsRun ntupleProducer_cfg.py isRealData=1

that will set up an appropriate global tag etc.

NB By defualt, the ntuples require that there be at least one muon(electron) with pT > 3(5) GeV in order for an event to be saved. In the case that this is not desired (for instance, in jet or photon based studies), you should switch off the skimLeptons option in ntupleProducer_cfg.py

In addition to this, there are various flags the configuration file, ntupleProducer_cfg.py, that allow to save/not save certain objects (muons, jets, etc). All are saved by default.

Running with CRAB

For running over individual datasets, it's best to use standard crab. The configuration files for MC and data are crabNtuples_MC.cfg and crabNtuples_Data.cfg. Submission goes as follows,

crab -create -cfg crabNtuples_<type>.cfg
crab -submit -c <ui_working_dir>

To check the status of your jobs,

crab -status -c <ui_working_dir>

and to get the log files,

crab -get -c <ui_working_dir>

More information can be found in the CMS SW guide chapter on CRAB.

For submission of ntuple production of multiple datasets at once, the multicrab framework can be used. It is described very briefly here. The important feature is that you can use most (?) of the standard crab commands for submission and checking on jobs status by replacing the crab command with multicrab. For instance when jobs are submitted in multicrab you can do the following,

multicrab -create -cfg <cfg_file> -submit

and to check their status

multicrab -status -c <ui_working_dir>

which also works for crab. You can also check the status of individual datasets using standard crab commands. The main difference is in the format of the configuration files. For multicrab, there is a crab.cfg file with a set of global configuration parameters and a multicrab.cfg file where each dataset is given its own specific configuration. As for the case of normal crab, two configuration files have been prepared for data and MC, multicrab_data.cfg and multicrab_mc.cfg.

Checking Output

After CRAB claims that your jobs are finished with exit codes 0 0, you will want to double check because it lies and large jobs tend to have a few extra or missing files.

Run the following command:

  ./find_goodfiles.py -c Path/To/CrabDir -q

This will check that all the jobs listed in the crab xml files are actually in your output area, and that your output area contains no extra or duplicate files. If it does, the script will tell you what needs to be rerun or what needs to be deleted.

Instructions for Developers

  • First, make sure you are on master branch and have the latest code:
  git checkout master
  git pull
  • Then create a new branch and swich to it:
  git branch dev-username
  git checkout dev-username
  • Now you can make any changes you want. Once you are done, commit it and push your branch.
  git commit -a
  git push origin dev-username
  • When you are satisfied with you new code, merge it with master branch. For that:
  git checkout master
  git merge dev-username
  git push

If the changes do not conflict, you are done. If there are conflicts, markers will be left in the problematic files showing the conflict; git diff will show this. Once you have edited the files to resolve the conflicts, git commit -a.

Tagging policy

At any time you can tag your code, and push your tags to remote:

  git tag -a test1 -m "my tag"
  git push origin --tags

You can use any tags you want, later those can be deleted.

For the global production though, we should stick with a tagging convention. Tags should be vX.Y and I am starting them with v6.1. Such that the tag corresponds to the nutuple_v6 name of ntuple production. If the new code significantly changes the format of the ntuples (substantial changes to class definitions etc.) then the first number of a tag should be incremented (to v7.1 etc.) and the ntuple production path-name should be changed correspondingly. Otherwise, incremental changes should be reflected in changes to the second digit.