RDataFrame-Totem

How to run the analysis on Helix Nebula

Log in to SWAN Helix Nebula
On the CERNBOX tab, open a new terminal (icon >_ on the top right corner)

Clone this repo:

git clone https://github.com/JavierCVilla/RDataFrame-Totem.git

Open the python notebook (DistillDistibution-AllDatasets.ipynb) from the SWAN Interface:
Start the Spark cluster connection, the default configuration is ready to run the analysis
Once connected, execute cells 1 to 7, this should be fairly fast since no computation will be triggered yet
Cell number 8 initializes the Spark job and starts the event loop:
- It may take some minutes for the creation of ranges
- After a couple of minutes, you will see the Spark monitoring with the job progress
Once finished, the rest of cells will show some results and save them to disk

How to run `distill.py`

Requirements

This script reads Totem data from eos, namely from the following path:

/eos/totem/data/cmstotem/2015/90m/Totem/Ntuple/version2/4495/

Therefore, the totem project needs to be mounted and accessible for the user.

Using pure python from a terminal

Clone this repository:

git clone https://github.com/JavierCVilla/RDataFrame-Totem.git

Prepare the environment

The code requires ROOT-6.14.00 or greater and Python.
Simplest way to fulfil this software dependencies is using the LCG Releases available through CVMFS.
The following command will setup your environment with these packages ready to be used:

source /cvmfs/sft.cern.ch/lcg/views/dev3python3/latest/x86_64-slc6-gcc62-opt/setup.sh

Alternatively, your own ROOT and Python installation can be used, in which case you should ensure the python ROOT module is properly configured in your environment so it can be imported:

python
>>> import ROOT
>>>

If the previous import failed, your PYTHONPATH may not be properly set. The easiest way to configure the environment for root is using its own setup script:

source /your/path/to/root/bin/thisroot.sh

Run the code:

python distill.py <diagonal> [threads number]

Valid diagonals: d45b_56t, d45t_56b, ad45b_56b, ad45t_56t

Using HelixNebula

Init a session in Swan HelixNebula and select the bleeding edge software stack, this is the only one that currently provides ROOT-6.14.00.
Just copy distill.ipynb to your Cernbox space or to your SWAN instance in HelixNebula.
Open the python notebook and execute the cells.
HelixNebula already provides the needed environment configuration as well as access to the eos files.

Comparing results

ROOT files produced by this code and the original analysis can be compared using the rootcompare.c script to ensure the same results are produced.

After setting up the environment, compile the script using:

g++ -o rootcompare rootcompare.c `root-config --cflags --glibs`

This program receives 4 arguments:

./rootcompare fileA treenameA fileB treenameB

NOTE: This script is not meant to be generic enough to compare any pair of root files, currently it's aim to compare only files produced by this analysis.

Name		Name	Last commit message	Last commit date
Latest commit History 103 Commits
analyses		analyses
cplusplus		cplusplus
python		python
results		results
rootcompare		rootcompare
.gitignore		.gitignore
DistROOT-FileBased.py		DistROOT-FileBased.py
DistROOT.py		DistROOT.py
DistillDistibution-AllDatasets.ipynb		DistillDistibution-AllDatasets.ipynb
DistillDistibution.ipynb		DistillDistibution.ipynb
Local-LCGRelease.md		Local-LCGRelease.md
README.md		README.md
common.h		common.h
common_algorithms.h		common_algorithms.h
common_cuts.h		common_cuts.h
common_definitions.h		common_definitions.h
common_parameters.h		common_parameters.h
distill.py		distill.py
distributions.ipynb		distributions.ipynb
distributions.py		distributions.py
generators.root		generators.root
input_files_DS1.txt		input_files_DS1.txt
input_files_DS2.txt		input_files_DS2.txt
input_files_DS3.txt		input_files_DS3.txt
input_files_DS4.txt		input_files_DS4.txt
input_files_DS5.txt		input_files_DS5.txt
input_files_DS6.txt		input_files_DS6.txt
input_files_DS7.txt		input_files_DS7.txt
notebook_times.csv		notebook_times.csv
parameters.h		parameters.h
parameters_global.h		parameters_global.h
spark_configuration.txt		spark_configuration.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RDataFrame-Totem

How to run the analysis on Helix Nebula

How to run `distill.py`

Using pure python from a terminal

Using HelixNebula

Comparing results

About

Releases

Packages

Contributors 4

Languages

JavierCVilla/RDataFrame-Totem

Folders and files

Latest commit

History

Repository files navigation

RDataFrame-Totem

How to run the analysis on Helix Nebula

How to run distill.py

Using pure python from a terminal

Using HelixNebula

Comparing results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

How to run `distill.py`

Packages