Skip to content

Commit

Permalink
Merge branch 'master' into update_hillas-reconstructor
Browse files Browse the repository at this point in the history
  • Loading branch information
HealthyPear authored Sep 27, 2021
2 parents 482f3b9 + ab2b44f commit 29aaed8
Show file tree
Hide file tree
Showing 6 changed files with 263 additions and 51 deletions.
15 changes: 10 additions & 5 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,15 +20,20 @@ protopipe |CI| |codacy| |coverage| |documentation| |doilatest|
A pipeline prototype for the `Cherenkov Telescope Array (CTA) <www.cta-observatory.org>`_.

- based on the `ctapipe <https://cta-observatory.github.io/ctapipe/>`_ and
`pyirf <https://cta-observatory.github.io/pyirf/>`__ libraries plus original code
- successfully tested code migrated and imported from each new release
- allows for full-scale analyses on the `DIRAC <http://diracgrid.org/>`__ computing grid
`pyirf <https://cta-observatory.github.io/pyirf/>`__ libraries plus original code,
- successfully tested code migrated and imported from each new release,
- allows for full-scale analyses on the `DIRAC <http://diracgrid.org/>`__ computing grid thanks to its `interface <https://github.com/HealthyPear/protopipe-grid-interface#readme>`__.

Resources
---------

- Source code: `GitHub repository <https://github.com/cta-observatory/protopipe>`__
- Documentation (master branch): `GitHub Pages <https://cta-observatory.github.io/protopipe>`__
- Source code (protopipe): `GitHub repository <https://github.com/cta-observatory/protopipe>`__
- Source code (DIRAC grid interface): `GitHub repository <https://github.com/HealthyPear/protopipe-grid-interface>`__
- Documentation:

- `GitHub Pages <https://cta-observatory.github.io/protopipe>`__ (only development version)
- `readthedocs <https://protopipe.readthedocs.io/en/latest/>`__ (also latest releases)

- Current performance: `RedMine <https://forge.in2p3.fr/projects/benchmarks-reference-analysis/wiki/Protopipe_performance_data>`__

- Slack channels:
Expand Down
18 changes: 18 additions & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,24 @@ Current performance is stored internally at `this RedMine page <https://forge.in
.. warning::
This is not yet stable code, so expect large and rapid changes.

Resources
---------

- Source code (protopipe): `GitHub repository <https://github.com/cta-observatory/protopipe>`__
- Source code (DIRAC grid interface): `GitHub repository <https://github.com/HealthyPear/protopipe-grid-interface>`__
- Documentation:

- `GitHub Pages <https://cta-observatory.github.io/protopipe>`__ (only development version)
- `readthedocs <https://protopipe.readthedocs.io/en/latest/>`__ (also latest releases)

- Current performance: `RedMine <https://forge.in2p3.fr/projects/benchmarks-reference-analysis/wiki/Protopipe_performance_data>`__

- Slack channels:

- `#protopipe <https://cta-aswg.slack.com/archives/CPTN4U7U7>`__
- `#protopipe_github <https://cta-aswg.slack.com/archives/CPUSPPHST>`__
- `#protopipe-grid <https://cta-aswg.slack.com/archives/C01FWH8E0TT>`__

Citing this software
--------------------

Expand Down
203 changes: 191 additions & 12 deletions docs/install/grid.rst
Original file line number Diff line number Diff line change
@@ -1,18 +1,39 @@
.. _install-grid:

================
Grid environment
================

.. contents::
:local:

Requirements
------------
************

DIRAC GRID certificate
======================

In order to access the GRID utilities you will need a certificate associated with an
account.

* credentials and certificate for using the GRID (follow `these instructions <https://forge.in2p3.fr/projects/cta_dirac/wiki/CTA-DIRAC_Users_Guide>`__.)
* `Vagrant <https://www.vagrantup.com/>`_
You can find all necessary information at
`this <https://forge.in2p3.fr/projects/cta_dirac/wiki/CTA-DIRAC_Users_Guide#Prerequisites>`_
Redmine wikipage.

Source code for the interface
=============================

.. warning::
Usage of the pipeline on an infrastucture different than the DIRAC grid has
not been fully tested.
This interface code is **highly** bound to DIRAC,
but the scripts which manage download, merge and upload of files
could be easily adapted to different infrastructures.

Getting a released version
--------------------------

You can find the latest released version `here <https://github.com/cta-observatory/protopipe/releases>`__
The latest released version are stored `at this GitHub repository <https://github.com/cta-observatory/protopipe/releases>`__

.. list-table:: compatibility between *protopipe* and its interface
:widths: 25 25
Expand All @@ -37,15 +58,173 @@ This version is always compatible *only* with the development version of *protop

``git clone https://github.com/HealthyPear/protopipe-grid-interface.git``

Container and options for containerization
==========================================

.. note::
Any of the following containerization choices constitutes a requirement.

- **Single user working from a personal Linux machine**

CTADIRAC can be installed natively on Linux (see `here <https://forge.in2p3.fr/projects/cta_dirac/wiki/CTA-DIRAC_Users_Guide#Native-client-installation-SL6-CentOS67>`_).
In this case make sure that the protopipe-grid-interface source code
resides at the same path as protopipe.

- **Single user working from a personal macos or Windows machine**

The *Docker* container should be enough.

- **User working on a shared environment (HPC machine or server)**

In case you are not allowed to use *Docker* for security reasons, another supported option is *Singularity*.

- on *Linux*, if you can't install natively make sure that either *Singularity* or *Docker* are available and accessible to your user,
- on *Windows* or *macos*, if you can't use *Docker* you will need to use *Singularity* via *Vagrant*.

Docker
------

The container used by the interface requires the
`installation of Docker <https://docs.docker.com/get-docker/>`_.

To enter the container (and the first time downloading the image),

| ``docker run --rm -v $HOME/.globus:/home/dirac/.globus``
| ``-v $PWD/shared_folder:/home/dirac/shared_folder``
| ``-v [...]/protopipe-grid-interface:/home/dirac/protopipe-grid-interface``
| ``-v [...]/protopipe:/home/dirac/protopipe``
| ``-it ctadirac/client``
where ``[...]`` is the path of your source code on the host.
The ``--rm`` flag will erase the container at exit
to save disk space (the data stored in the ``shared_folder`` won't disappear).
Please, refer to the Docker documentation for other use cases.

.. note::
In case you are using a released version of *protopipe*, there is no container
at the moment and the GRID environment based on CTADIRAC still requires Python2.
In this case you can link the source code folder from your python environment
installation on the host just like you would do with the development
version (``import protopipe; protopipe.__path__``).

.. warning::
If you are using *macos* you could encounter some disk space issues.
Please check `this page <https://docs.docker.com/docker-for-mac/space/>`_ and
`this other page <https://djs55.github.io/jekyll/update/2017/11/27/docker-for-mac-disk-space.html>`_
on how to manage disk space.

Vagrant
-------

.. note::
Only required for users that want to use a *Singularity*
container on a *macos* and *Microsoft Windows* machine.

All users, regardless of their operative systems, can use this interface via
`Vagrant <https://www.vagrantup.com/>`_.

The *VagrantFile* provided with the interface code allows to download a virtual
machine in form of a *Vagrant box* which will host the actual container.

The user needs to,

1. copy the ``VagrantFile`` from the interface
2. edit lines from 48 to 59 according to the local setup
3. enter the virtual machine with``vagrant up && vagrant ssh``

The *VagrantFile* defines creates automatically also the ``shared_folder``
used by the interface to setup the analysis.

Singularity
-----------

.. warning::
Support for *Singularity* has been dropped by the mantainers of *CTADIRAC*.
The following solutions have not been tested in all possible cases.

- **macos / Microsoft Windows**

`Singularity <https://sylabs.io/docs/>`_ is already installed and ready to use from the *Vagrant box*
obtained by using the *VagrantFile*.

- **Linux**

users that do not want to use *Vagrant* will need to have *Singularity* installed
on their systems and they will need to edit their own environment accordingly.

For pure-*Singularity* users (aka on Linux machines without *Vagrant*)
bind mounts for *protopipe*, its grid interface and the shared_folder
will work in the same way: ``--bind path_on_host:path_on_container``.

The DIRAC grid certificate should be already available since *Singularity*
mounts the user's home by default.
For more details, please check e.g.
`system-defined bind paths <https://sylabs.io/guides/3.8/user-guide/bind_paths_and_mounts.html#system-defined-bind-paths>`_.

Depending on the privileges granted on the host there are 2 ways to get a working container.

Using the CTADIRAC Docker image
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

**Method #1**

Provided you have at least *Singularity 3.3*, you can pull directly the CTADIRAC Docker image from *DockerHub*,
but you will need to use the ``fakeroot`` mode.
This mode grants you root privileges only *inside* the container.

``singularity build --fakeroot ctadirac_client_latest.sif docker://ctadirac/client``

``singularity shell --fakeroot ctadirac_client_latest``

``. /home/dirac/dirac_env.sh``

**Method #2**

You shouldn't need root privileges for this to work (not throughly tested, though),

``singularity build --sandbox --fix-perms ctadirac_client_latest.sif docker://ctadirac/client``

``singularity shell ctadirac_client_latest``

``. /home/dirac/dirac_env.sh``

Building the Singularity image
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Support for *Singularity* has been dropped by the mantainers of *CTADIRAC*,
but the recipe for the container has been saved here.

In this case you won't need to do ``. /home/dirac/dirac_env.sh``: the
commands will be already stored in your ``$PATH``.

.. warning::
The recipe ``CTADIRAC_singularity`` is maintained by the author; if any bug arises,
reverting to the methods described above (if possible) will provide you with a working environment.

If you have root privileges you can just build your own image with,

``singularity build ctadirac_client_latest.sif CTADIRAC_singularity``

otherwise you have to either,

- revert to the ``--fakeroot`` mode
(use it also to enter the container just like the methods above)

- build the image remotely at ``https://cloud.sylabs.io`` using the ``--remote`` flag
(for this you will need to interface with that servce to generate an access token)

Setup the working environment
-----------------------------

1. create and enter a folder where to work,
2. copy the ``VagrantFile`` from the interface
3. edit lines from 48 to 59 according to your local setup
4. ``vagrant up && vagrant ssh``
5. ``singularity pull --name CTADIRAC_with_protopipe.sif shub://HealthyPear/CTADIRAC``
6. ``singularity shell CTADIRAC_with_protopipe.sif``
*****************************

The CTADIRAC container doesn't provide everything *protopipe* needs,
but this can be solved easily by issuing the following command inside the container's home directory,

``source protopipe-grid-interface/setup.sh``

This will not only install some missing Python packages,
but also provide convenient environment variables ``$GRID_INTERFACE`` and ``$PROTOPIPE``
for the source code and check that the DIRAC interface has been properly
installed and initialized.

From here,

Expand Down
62 changes: 37 additions & 25 deletions docs/usage/use_grid.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,22 +23,26 @@ Usage

You will work with two different virtual environments:

- protopipe (Python >=3.5, conda environment)
- GRID interface (Python 2.7, inside the container).
- protopipe (Python >=3.7)
- GRID interface (Python 2.7)

Their location and activation will depend on your installation if choice
(see :ref:`install-grid`).

Open 1 tab for each of these environments on you terminal so you can work seamlessly between the 2.

To monitor the jobs you can use the
To monitor the jobs you can the
`DIRAC Web Interface <https://ccdcta-web.in2p3.fr/DIRAC/?view=tabs&theme=Crisp&url_state=1|*DIRAC.JobMonitor.classes.JobMonitor:,>`_

1. **Setup analysis** (GRID enviroment)

1. Enter the container
2. ``python $GRID/create_analysis_tree.py --analysis_name myAnalysis``
After having entered the container use the script

``python $GRID_INTERFACE/create_analysis_tree.py``

All configuration files for this analysis are stored under ``configs``.
Throughout these instructions ``$ANALYSIS`` will be a label for the analysis
path within or outside of the container.
to create a complete analysis directory depending on your setup.
The script will store and partially edit for you all the necessary
configuration files under the ``configs`` folder as well as the operational
scripts to download and upload data and model files under ``data`` and
``estimators`` respectively.

.. figure:: ./AnalysisTree.png
:width: 250
Expand All @@ -47,45 +51,46 @@ Usage
2. **Obtain training data for energy estimation** (GRID enviroment)

1. edit ``grid.yaml`` to use gammas without energy estimation
2. ``python $GRID/submit_jobs.py --config_file=grid.yaml --output_type=TRAINING``
2. ``python $GRID_INTERFACE/submit_jobs.py --analysis_path=[...]/test_analysis --output_type=TRAINING``
3. edit and execute ``$ANALYSIS/data/download_and_merge.sh`` once the files are ready

3. **Build the model for energy estimation** (both enviroments)

1. switch to the ``protopipe environment``
2. edit ``regressor.yaml``
3. launch the ``build_model.py`` script of protopipe with this configuration file
4. you can operate some diagnostics with ``model_diagnostic.py`` using the same configuration file
5. diagnostic plots are stored in subfolders together with the model files
6. return to the ``GRID environment`` to edit and execute ``upload_models.sh`` from the estimators folder
2. edit the configuration file of your model of choice
3. use ``protopipe-MODEL`` with this configuration file
4. (development users) use the proper benchmarking notebooks under ``docs/contribute/benchmarks`` to check the performance of the generated models
5. return to the ``GRID environment`` to edit and execute ``upload_models.sh`` from the estimators folder

4. **Obtain training data for particle classification** (GRID enviroment)

1. edit ``grid.yaml`` to use gammas **with** energy estimation
2. ``python $GRID/submit_jobs.py --config_file=grid.yaml --output_type=TRAINING``
2. ``python $GRID_INTERFACE/submit_jobs.py --analysis_path=[...]/test_analysis --output_type=TRAINING``
3. edit and execute ``$ANALYSIS/data/download_and_merge.sh`` once the files are ready
4. repeat the first 3 points for protons
5. (development users) use the proper benchmarking notebooks under ``docs/contribute/benchmarks`` to check the estimated energies

4. **Build a model for particle classification** (both enviroments)

1. switch to the ``protopipe environment``
2. edit ``classifier.yaml``
3. launch the ``build_model.py`` script of protopipe with this configuration file
4. you can operate some diagnostics with ``model_diagnostic.py`` using the same configuration file
5. diagnostic plots are stored in subfolders together with the model files
6. return to the ``GRID environment`` to edit and execute ``upload_models.sh`` from the estimators folder
2. edit ``RandomForestClassifier.yaml``
3. use ``protopipe-MODEL`` with this configuration file
4. (development users) use the proper benchmarking notebooks under ``docs/contribute/benchmarks`` to check the performance of the generated models
5. return to the ``GRID environment`` to edit and execute ``upload_models.sh`` from the ``estimators`` folder

5. **Get DL2 data** (GRID enviroment)

Execute points 1 and 2 for gammas, protons, and electrons separately.

1. ``python $GRID/submit_jobs.py --config_file=grid.yaml --output_type=DL2``
1. ``python $GRID_INTERFACE/submit_jobs.py --analysis_path=[...]/test_analysis --output_type=DL2``
2. edit and execute ``download_and_merge.sh``
3. (development users) use the proper benchmarking notebooks under ``docs/contribute/benchmarks`` to check the quality of the generated DL2 data

6. **Estimate the performance** (protopipe enviroment)

1. edit ``performance.yaml``
2. launch the performance script with this configuration file and an observation time
3. (development users) use the proper benchmarking notebooks under ``docs/contribute/benchmarks`` to check the quality of the generated DL3 data


Troubleshooting
Expand Down Expand Up @@ -125,9 +130,16 @@ Something went wrong during the download phase, either because of your network
connection (check for possible instabilities) or because of a problem
on the server side (in which case the solution is out of your control).

The best approach is:
First let the process finish and eliminate the incomplete merged file, then
the recommended approach is to use the DIRAC's command,

``dirac-dms-directory-sync source destination``

where ``source`` is the LFN on DIRAC's FileCatalog and ``destination`` is the
target folder under you analysis directory tree.

If this doesn't work, a more manual approach is:

- let the process finish and eliminate the incomplete merged file,
- go to the GRID, copy the list of files and dump it into e.g. ``grid.list``,
- do the same with the local files into e.g. ``local.list``,
- do ``diff <(sort local.list) <(sort grid.list)``,
Expand Down
Loading

0 comments on commit 29aaed8

Please sign in to comment.