Instructions to run the validation code with the Spark cluster

The Apache Spark cluster is a powerful framework that is used for big data processing of data stored in Hadoop. The Spark components work as a Cloud in the sense that you can connect to them in an interactive way via Swan (Jupyter notebook) or via shell. The interactive way is the easiest one, Swan Projects, so notebooks can be runned there to develope and test code. In the other hand, a connection should be made via a lxplus machine.

In both cases it's necessary to request acces to the Spark cluster and Hadoop space system before starting. This can be done using the link.

Start a session with Swan

It's trivial. Just have to select "Analytix" in the Spark cluster option when it's initillized.

Start a session with lxplus

To connect to Spark and Hadoop via lxplus machine:

source /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/setup.sh

source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh <cluster name> spark3

Here, the cluster name can be: hadoop-qa, analytix, lxhadoop, nxcals-prod. Now, the option to use is:

source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh analytix spark3

Then, use:

kinit

It will ask your password. Finally, if you have permision to use hadoop you can test:

hdfs dfs -ls /

or

hdfs dfs -mkdir /hdfs/user/UserName/testFolder

Connect with hadoop from lxplus

These commands are used to get connected directly to the /hdfs space, but problems can occur.

ssh it-hadoop-client

kinit

Connection with the Muon validation code (This is our main way)

The optimal way to get a connection and to install the code:

git clone https://gitlab.cern.ch/cms-muonPOG/spark_tnp.git

cd spark_tnp

source env.sh

kinit

How to generate distributions

The code is developed in the context of the Muon-POG spark code, so, as in the case of efficiencies, it reads the configuration file and plot the ratio and no ratio plots of the one dimentional variables initialized in the section "binVariables" of the configuration.json file.

To run the code, two different options:

Produce Data/MC distributions for a full era:

./tnp_fitter.py compare particle probe resonance era configs/muon_example.json --baseDir ./example

For example:

./tnp_fitter.py compare muon generalTracks Z Run2018_UL configs/muon_example.json --baseDir ./example

Produce distributions comparing two specific suberas:

Two options, compare Data or MC datasets from the same era or from different eras. In the first case:

./tnp_fitter.py compare particle probe resonance era configs/muon_example.json --baseDir ./example --subera1 SubEra1 --subera2 SubEra2

For example:

./tnp_fitter.py compare muon generalTracks Z Run2018_UL configs/muon_example.json --baseDir ./example --subera1 Run2018A --subera2 DY_madgraph

In the second case, from two different eras:

./tnp_fitter.py compare particle probe resonance era1 configs/muon_example.json --baseDir ./example --subera1 SubEra1 --subera2 SubEra2 --era2 Era2

For example:

./tnp_fitter.py compare muon generalTracks Z Run2018_UL configs/muon_example.json --baseDir ./example --subera1 Run2018A --subera2 DY_madgraph --era2 Run2016_UL

Submit work to condor

In case of having to produce plots for a lot of muon IDs or RECOs, the code can spend a lot of time processing the parquet files one by one for each efficiency. This fact can also occur if you want to draw histograms for much variables. In that case, an option can be added to compute the histograms for each efficiency separately, i.e. the work is parallelized as a funtion of each muon type.

Add to the command line:

--condor_submit

For example:

./tnp_fitter.py compare particle probe resonance era configs/muon_example.json --baseDir ./example --condor_submit

Final steps

Once the plots have been generated:

cp -r ./BaseDir/plots/particle/probe/resonance/era/* /eos/user/u/username/www/some_directory/

cp -r ./example/plots/muon/generalTracks/Z/Run2018_UL/* /eos/user/u/username/www/some_directory/


cd /eos/user/u/username/www/some_directory/

find . -type d -exec cp index.php {} \;

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
README.md		README.md
compare.py		compare.py
compare_one_job.py		compare_one_job.py
condor_wrapper.sh		condor_wrapper.sh
muon_HWW.json		muon_HWW.json
muon_validation.json		muon_validation.json
nanoAOD_converter.py		nanoAOD_converter.py
nanoAODv9_converter.py		nanoAODv9_converter.py
ntuple_variables.txt		ntuple_variables.txt
run_multiple_compare.py		run_multiple_compare.py
tnp_fitter.py		tnp_fitter.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instructions to run the validation code with the Spark cluster

Start a session with Swan

Start a session with lxplus

Connect with hadoop from lxplus

Connection with the Muon validation code (This is our main way)

How to generate distributions

Produce Data/MC distributions for a full era:

Produce distributions comparing two specific suberas:

Submit work to condor

Final steps

About

Releases

Packages

Languages

BlancoFS/MuonValidationSpark

Folders and files

Latest commit

History

Repository files navigation

Instructions to run the validation code with the Spark cluster

Start a session with Swan

Start a session with lxplus

Connect with hadoop from lxplus

Connection with the Muon validation code (This is our main way)

How to generate distributions

Produce Data/MC distributions for a full era:

Produce distributions comparing two specific suberas:

Submit work to condor

Final steps

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages