Ntuples_version.txt

Ntuples:

2022_04_02_NtuplesV0:
First version, no cuts at all

2022_04_21_NtuplesV0_etacut:
Cut jets with chucky donut partially or completely outside barrel
'python batchSubmitOnTier3.py --v gamma1 --etacut 24’

2022_04_21_NtuplesV0_jetcut:
Cut jets with pt > 60 GeV
'python batchSubmitOnTier3.py --v gamma1 --jetcut 60’

2022_04_21_NtuplesV0_noise:
Apply noise cut for TT with eta=26,27,28
L111-113 of batchReader.py

2022_04_21_NtuplesV0_saturation:
Apply saturation cut for energy deposit bigger than 255 ieta
L107-113 of batchReader.py

2022_04_21_NtuplesV1:
Apply all cuts: etacut, jetcut, noise, saturation

2022_04_21_NtuplesV2:
Apply all cuts on training sample: etacut, jetcut, noise, saturation
Not apply jetcut on testing sample

2022_04_25_NtuplesV3:
Apply all cuts: etacut, jetcut, noise, saturation
Keep information for the plots

2022_04_26_NtuplesV4:
Apply all cuts: etacut, jetcut, noise, saturation
Add a new cut on the ECAL energy fraction: E/(E+H) > 0.8
'python batchSubmitOnTier3.py --v gamma1 --jetcut 60 --etacut 24 --ecalcut True'

2022_04_26_NtuplesV5:
In the batchReader.py I changed the place where I apply the cut on the ECAL energy fraction.
We need to understand what happens between L176-198.
I think that the problem is that I was using a groupby on 'event', while I should have done it on 'uniqueIdx'

2022_04_28_NtuplesV6:
First version of HCAL ntuples.
'python batchSubmitOnTier3.py --v qcd --jetcut 60 --etacut 24'

2022_04_30_NtuplesV7:
Fixed errors in batchReader.py
Update everything to have the new backup QCD datasets (50To80, 80To120, 120To170)

2022_04_28_NtuplesV8:
Introduce lower cut on jet pt for QCD:
    - 30GeV for  QCD_Pt-50To80 
    - 50GeV for  QCD_Pt-80To120 
    - 60GeV for  QCD_Pt-120To170 

2022_05_02_NtuplesV9:
Introduce lower cut on jet pt for QCD:
    - 30GeV for  QCD_Pt-50To80 
    - 50GeV for  QCD_Pt-80To120 
    - 60GeV for  QCD_Pt-120To170

Introduce HCAl eta cut <= 24


2022_05_03_NtuplesV11:
Introduce lower cut on jet pt for QCD:
    - 50GeV for  QCD_Pt-50To80
    - 50GeV for  QCD_Pt-80To120
    - 60GeV for  QCD_Pt-120To170
Introduce upper pt cut on jet pt:
    - 150GeV
HCAL eta cut <= 24
Introduce firstsubtraction of iem deposit
    --> this subtraction is bugged due to missing factor 2 in iem sum (0.5GeV units)

2022_05_09_NtuplesV12:
ECAL: 
- ietacut = 28
- uJetPtcut = 100
- lJetPtcut = None
- Ecalcut = 0.8
Training: python3 NNModelTraining.py --indir 2022_05_09_NtuplesV12 --v ECAL --etrain iem
Targeting: jetPt - ihad

HCAL: 
- ietacut = 28
- lJetPtcut 50GeV for  QCD_Pt-50To80 
- lJetPtcut 50GeV for  QCD_Pt-80To120 
- lJetPtcut 60GeV for  QCD_Pt-120To170
- uJetPtcut = 150 ?
- Ecalcut = None
Training: python3 NNModelTraining.py --indir 2022_05_09_NtuplesV12 --v HCAL --etrain ihad 
--ECALModel /data_CMS/cms/motta/CaloL1calibraton/2022_05_09_NtuplesV12/ECALtraining

2022_05_10_NtuplesV13
HCAL:
- New samples with no PU (/data_CMS/cms/motta/CaloL1calibraton/L1NTuples/QCD_Pt15to7000_TuneCP5_14TeV-pythia8__Run3Summer21DR-NoPUFEVT_castor_120X_mcRun3_2021_realistic_v6-v1__GEN-SIM-DIGI-RAW_uncalib_applyHCALpfa1p)
- Add saturation cut on iet
- HCAL information is ihad for ieta < 29 and iet for ieta > 29
- readme is saving every information

python3 batchMaker.py --v HCAL --applyHCALpfa1p --chunk_size 2500 --doNewQCD --outdir 2022_05_10_NtuplesV13 --applyNoCalib
module use /opt/exp_soft/vo.llr.in2p3.fr/modulefiles_el7
module load python/3.7.0
python batchSubmitOnTier3.py --doNewQCD --etacut 37 --indir 2022_05_10_NtuplesV13 --applyHCALpfa1p --lJetPtCut 25 --uJetPtCut 150 --applyNoCalib

2022_05_11_NtuplesV14
ECAL:
python3 batchMaker.py --v ECAL --applyHCALpfa1p --chunk_size 5000 --doEG0_200 --applyNoCalib --outdir 2022_05_11_NtuplesV14
module use /opt/exp_soft/vo.llr.in2p3.fr/modulefiles_el7
module load python/3.7.0
python batchSubmitOnTier3.py --doEG0_200 --etacut 28 --indir 2022_05_11_NtuplesV14 --applyHCALpfa1p --uJetPtCut 100 --applyNoCalib --trainPtVers ECAL
python3 batchMerger.py --applyHCALpfa1 --indir 2022_05_11_NtuplesV14 --v ECAL --applyNoCalib --doEG --sample {train or test}
python3 NNModelTraining.py --indir 2022_05_11_NtuplesV14 --v ECAL
python3 CalibrationFactor.py --indir 2022_05_11_NtuplesV14 --v ECAL --start 1 --stop 200 --maxeta 28
python3 ModelPlots.py --indir 2022_05_11_NtuplesV14 --v ECAL --out data_ECAL_V14

2022_05_12_NtuplesV15
First HCAL training --> superseeded right after by V16

2022_05_*_NtuplesVtest*
Random tests that lead to the need for V16 and the extended studies in there

2022_05_18_NtuplesV16
Several HCAL tests to understand how to make things work:
    - tried flattening pT distribution (no good results)
    - tried different pT cuts on input (converged on having pT>30 for now)
    - introduced HOE cuts (converged on applying HOE>95)
    - tried to subtract iem to jetPt (no good results as it introduces very bad L1 resolution into gen resolution)
    - removed 3-6-9 requirement for HCAL training (good results)
    - introduced loss and root_mean_square_error evaluation and analysis
    - worked on solving NN issues:
        - introduced flooring in training
        - introduced 'ignoring' of input if E=0

2022_05_23_NtuplesV17
SinglePion production for supposed HCAL training
After all the improvements introduced in V16 a training woth this has never happened

2022_05_27_NtuplesV18
Rerun the full machinery to produce a final training and a final set of SFs both for ECAL and for HCAL
These SFs will then be used for the re-emulation of the L1 NTuples

    --> after doing everything noticed bug in cfg for L1NTuples --> fixed in next iteration

2022_06_01_NtuplesV19
Rerun the full machinery to produce a final training and a final set of SFs both for ECAL and for HCAL
These SFs will then be used for the re-emulation of the L1 NTuples

2022_06_10_NtuplesV20
Run the full machinery on the NTuples with PU
    --> this should have lowered the SFs for low energy TT
    --> did not really work

2022_06_14_NtuplesV21
Run the machinery on the same NTuples used for V19 but with saturation of the SFs inside the NN (but outside of TTP)

2022_06_14_NtuplesV22
Inclusion of nFT in the tensors and test of nFT weighting in the loss
First tests of rate proxy inside the NN


##########################################################################################

DISCOVERED A PROBLEM IN THE SAVING OF THE HDF5 FILES
The pickling of the dataframes has been 'creating' some TT that were actally
set to 0 by the emulation (those that should have been zero suppressed).
This might also include modification of values for good TTs and biases
in the training.

SOLVED THE ISSUE WITH THE PICKLING AND RESTARTED FROM SCRATCH WITH THE TRAININGS

##########################################################################################


2022_07_20_Ntuples23
Rerun the full machinery after having discovered the issue with the dataframes pickling
The setup is the one that was already used for V19
Latest and greatest with nFT weighing and rate proxy are to be investigated

2022_08_29_NtuplesV24
Same ntuples as 2022_07_20_Ntuples23. The folder is used for testing of different manual saturations

2022_09_05_NtuplesV25
Training with jets having at most 10 TTs

2022_09_05_NtuplesV26
Training with jets having at least 10 TTs

2023_01_16_NtuplesV27
First working version of rate proxy based on SingleNeutrino MC
All results presented at L1 DPG meeting on 6th of January

2023_02_10_NtuplesV28
First working version of calibration of ECAL and HCAL at the same time

2023_02_20_Ntuplesv29
First working version of calibration with data
Done both for split ECAL/HCAL training, and single ECAL+HCAL training
    --> from thsi one, it looks like keeping them separated might be the best option in order not to down-calibrate HCAL
    --> from the split version it might look like HF might also need to be calibrated separately

2023_02_22_NtuplesV30
test of ECAL/HCAL/HF separate calibration

2023_02_28_NtuplesV31
new calibration with data, with the "new" framework fully operational (done flattening the eta distribution and having HoE>0.9 for the jets)

2023_03_03_Ntuplesv32
Calibration on new MC samples: /SinglePionGun_E0p2to200/Run3Winter23Digi-NoPU_126X_mcRun3_2023_forPU65_v1-v2/GEN-SIM-RAW
Just for HCAL (issue with the eta distribution stopping at |eta| = 2)
The ECAL model used for the training is copied from 2023_01_16_NtuplesV27/ECALtraining_0pt500/ into 2023_03_03_NtuplesV32/ECALtrainingDataReco

2023_03_06_Ntuplesv33
Calibration on electron data (adding era E and F)
_normalOrder: calibrate ECAL first and HCAL later
_invertedOrder: calibrate HCAL first and ECAL later
_Rate0p8: target rate proxy 0.8*rate
_Rate1p2: target rate proxy 1.2*rate

2023_03_15_Ntuplesv34
Training on MC for jets but the stat is too small after applying pt and HoE cut.

2023_03_22_Ntuplesv35
Training on private MC data (1M events)
Target is defined as pt - ECAL(transerve)
The ECAL(transerve) deposit is computed from the eta value inside the L1Ntuplizer.

2023_03_24_Ntuplesv36
Training on private MC data (1M events)
Target is pt
We expect to have much lower stat than the v35 training (180314)

2023_03_25_Ntuplesv37
Training on private MC data (3M events)
Target is private MC data (1M events)
Target is defined as pt - ECAL(transerve)
The ECAL(transerve) deposit is computed from the eta value inside the L1Ntuplizer.

2023_03_26_Ntuplesv38
Training on private MC data (3M events)
Target is pt
We expect to have much lower stat than the v37 training but maybe as much as v35

2023_04_06_NtuplesV39
Training ECAL on data with study of the NN
- ECALtrainingDataReco
The training sample is copied from 2023_03_06_NtuplesV33/EGamma__Run2022E-ZElectron-PromptReco-v1__RAW-RECO__GT124XdataRun3Promptv10_CaloParams2022v06_noL1Calib_data_reco_json/tensors
(era E) in order to have enough statistics but not too long training
    - model_ECAL   : full loss      (regularization term with absolute value),  LR=10^-3,   new proxy rate (sigmoid)
    - model_ECALB  : full loss      (regularization term with square),          LR=10^-3,   new proxy rate (sigmoid)
    - model_ECALC  : regression term only                                       LR=10^-3,
    - model_ECALD  : regression term + rate                                     LR=10^-3,   new proxy rate (sigmoid)
    - model_ECALD1 : regression term + rate                                     LR=10^-3,   0.5*new proxy rate (sigmoid)
    - model_ECALD2 : regression term + rate                                     LR=10^-3,   1.5*new proxy rate (sigmoid)
    - model_ECALE  : regression term + rate                                     LR=10^-3,   new proxy rate (relu)
    - model_ECALF  : regression term + rate                                     LR=10^-3,   new proxy rate (cosh)
    - model_ECALG  : regression term only                                       LR=10^-5,
- ECALtrainingDataRecoFullStat
Few trainings with full stat only for interesting tests, including three hidden layers in the NN (version H)

2023_04_13_NtuplesV40
Training on HCAL with MC : apply ZeroSuppression, consider only HBHE (etacut 28), HoE 0.95, jetpt > 50

2023_04_18_NtuplesV41
Training of HCAL on data for the barrel only: ieta <= 15, pt > 30, apply ZeroSuppression 
Re-emulation considering raw energy
This version is bugged because of the ZS: we remove the ihad but not the 1 oncoded eta value

# found bug of one hot encoding!
2023_04_29_NtuplesV42
Training of HCAL in data for barrel only: ieta <-15, pt > 30, apply ZeroSuppression
- HCALtrainingDataReco
    - model_HCAL_A  : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=20


2023_04_29_NtuplesV43
Training of HCAL in data for barrel only: ieta <-15, pt > 30, NO ZeroSuppression
- HCALtrainingDataReco
    - model_HCAL_A  : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=20

    - model_HCAL_C1 : regression term only      LR=10^-4        Nodes_1=82      Nodes_2=256     epochs=20
    - model_HCAL_C2 : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=20
    - model_HCAL_C3 : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=100
    - model_HCAL_C4 : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=40       HiddenLayers
    - model_HCAL_C5 : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=200      HiddenLayers
    - model_HCAL_C6 : regression term only      LR=10^-3        Nodes_1=82      Nodes_2=256     epochs=100      HiddenLayers