Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve models generation #96

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
0efd817
Refactor model script I/O
HealthyPear Jan 25, 2021
71f59ea
Improve input features and add missing CTAMARS ones
HealthyPear Jan 25, 2021
e5b64bb
Started improving model script (see complete commit message for details)
HealthyPear Jan 25, 2021
2f045ab
Update from master and solve conflicts
HealthyPear Apr 12, 2021
8819bde
clarify CLI help
HealthyPear Apr 12, 2021
e070b06
small format changes to protopipe.mva.utils.prepare_data
HealthyPear Apr 12, 2021
b554334
simplify a condition in TrainModel
HealthyPear Apr 12, 2021
07fcb9e
Test improvement of models initialization
HealthyPear Apr 12, 2021
03f8c97
allow fit of single model (no GridSearchCV)
HealthyPear Apr 12, 2021
22b264c
small formatting change
HealthyPear Apr 12, 2021
de38037
Add example configuration file for RandomForestRegressor
HealthyPear Apr 12, 2021
fb26a1c
Add example configuration file for RandomForestClassifier
HealthyPear Apr 12, 2021
a55c861
fix input signal file name key
HealthyPear Apr 12, 2021
1ab50d7
Add testing files for RandomForestClassifier and RandomForestRegressor
HealthyPear Apr 14, 2021
9fb476f
Add test configuration file for AdaBoostRegressor (replaces regressor)
HealthyPear Apr 14, 2021
861e61b
Add AdaBoostRegressor configuration file
HealthyPear Apr 14, 2021
52104a5
Update model output
HealthyPear Apr 14, 2021
beb5562
Update example config files for RandomForest-based algorithms
HealthyPear Apr 14, 2021
e3fecb5
Improve protopipe.mva.utils.prepare_data
HealthyPear Apr 14, 2021
608d0f5
Improve and simplify protopipe-MODEL
HealthyPear Apr 14, 2021
7fad104
Modify protopipe-TRAINING according to new version of protopipe-MODEL
HealthyPear Apr 14, 2021
dc92517
Modify protopipe-DL2 according to modification to protopipe-MODEL
HealthyPear Apr 14, 2021
b5539e4
Update test configuration files
HealthyPear Apr 14, 2021
44a9317
Update test pipeline
HealthyPear Apr 14, 2021
2e01972
Remove obsolete MVA example/test configuration files
HealthyPear Apr 14, 2021
c76eee6
Update documentation
HealthyPear Apr 14, 2021
a68cab6
Rename some regressor features
HealthyPear Apr 15, 2021
45bc7d9
Remove code leftovers from DL2 script
HealthyPear Apr 15, 2021
47c522b
Fix check for classification features
HealthyPear Apr 15, 2021
f589fc8
Improve check for model type
HealthyPear Apr 15, 2021
8411fe7
Remove old test configuration files for regressor and classifier
HealthyPear Apr 15, 2021
44368ec
Fix comment/description in configuration files
HealthyPear Apr 15, 2021
95cff80
Fix names of energy-releated features
HealthyPear Apr 15, 2021
d10dac3
Check if label is explicitly None because it can be also 0
HealthyPear Apr 15, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
26 changes: 18 additions & 8 deletions docs/mva/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,20 +10,30 @@ Introduction
classification problems. It is based on machine learning methods available in
scikit-learn_. Internally, the tables are dealt with the Pandas_ Python module.

For each type of camera a regressor/classifier should be trained. For both type of models
an average of the image estimates is later computed to determine a global
output for the event (energy or score/gammaness).
For each type of camera a regressor/classifier should be trained.
For both type of models an average of the image estimates is later computed to
determine a global output for the event (energy or score/gammaness).

Details
-------

Data is split in train and test subsamples by images.
Data is split in train and test subsamples by single telescope images.

The class `TrainModel` uses a training sample composed of gamma-rays for a
The class ```TrainModel``` uses a training sample composed of gamma-rays for a
regression model. In addition of a gamma-ray sample, a sample of
protons is also used to build a classifier. The training of a model is done via
the GridSearchCV_ algorithm which allows to find the best hyper-parameters of
the models.
protons is also used to build a classifier.

The training of a model can be done also via the GridSearchCV_ algorithm which
allows to find the best hyper-parameters of the models.

Supported models:

- ``sklearn.ensemble.RandomForestClassifier``
- ``sklearn.ensemble.RandomForestRegressor``
- ``sklearn.ensemble.AdaBoostRegressor``

For details about the generation of each model type, please refer to
:ref:`model_building`.

Reference/API
-------------
Expand Down
23 changes: 11 additions & 12 deletions docs/scripts/DL2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,14 +15,11 @@ By invoking the help argument, you can get help about how the script works:

.. code-block::

usage: protopipe-DL2 [-h] --config_file CONFIG_FILE -o OUTFILE [-m MAX_EVENTS]
[-i INDIR] [-f [INFILE_LIST [INFILE_LIST ...]]]
[--cam_ids [CAM_IDS [CAM_IDS ...]]] [--wave_dir WAVE_DIR]
[--wave_temp_dir WAVE_TEMP_DIR] [--wave | --tail]
[--debug] [--regressor_dir REGRESSOR_DIR]
[--classifier_dir CLASSIFIER_DIR]
[--force_tailcut_for_extended_cleaning FORCE_TAILCUT_FOR_EXTENDED_CLEANING]
[--save_images]
usage: protopipe-DL2 [-h] --config_file CONFIG_FILE -o OUTFILE [-m MAX_EVENTS] [-i INDIR] [-f [INFILE_LIST [INFILE_LIST ...]]]
[--cam_ids [CAM_IDS [CAM_IDS ...]]] [--wave_dir WAVE_DIR] [--wave_temp_dir WAVE_TEMP_DIR] [--wave | --tail] [--debug]
[--regressor_dir REGRESSOR_DIR] [--classifier_dir CLASSIFIER_DIR]
[--force_tailcut_for_extended_cleaning FORCE_TAILCUT_FOR_EXTENDED_CLEANING] [--save_images]
[--regressor_config REGRESSOR_CONFIG] [--classifier_config CLASSIFIER_CONFIG]

optional arguments:
-h, --help show this help message and exit
Expand All @@ -35,11 +32,9 @@ By invoking the help argument, you can get help about how the script works:
give a specific list of files to run on
--cam_ids [CAM_IDS [CAM_IDS ...]]
give the specific list of camera types to run on
--wave_dir WAVE_DIR directory where to find mr_filter. if not set look in
$PATH
--wave_dir WAVE_DIR directory where to find mr_filter. if not set look in $PATH
--wave_temp_dir WAVE_TEMP_DIR
directory where mr_filter to store the temporary fits
files
directory where mr_filter to store the temporary fits files
--wave if set, use wavelet cleaning -- default
--tail if set, use tail cleaning, otherwise wavelets
--debug Print debugging information
Expand All @@ -50,3 +45,7 @@ By invoking the help argument, you can get help about how the script works:
--force_tailcut_for_extended_cleaning FORCE_TAILCUT_FOR_EXTENDED_CLEANING
For tailcut cleaning for energy/score estimation
--save_images Save images in images.h5 (one file testing)
--regressor_config REGRESSOR_CONFIG
Configuration file used to produce regressor model
--classifier_config CLASSIFIER_CONFIG
Configuration file used to produce classification model
24 changes: 9 additions & 15 deletions docs/scripts/data_training.rst
Original file line number Diff line number Diff line change
Expand Up @@ -19,15 +19,10 @@ By invoking the help argument, you can get help about how the script works:

.. code-block::

usage: protopipe-TRAINING [-h] --config_file CONFIG_FILE -o OUTFILE
[-m MAX_EVENTS] [-i INDIR]
[-f [INFILE_LIST [INFILE_LIST ...]]]
[--cam_ids [CAM_IDS [CAM_IDS ...]]]
[--wave_dir WAVE_DIR]
[--wave_temp_dir WAVE_TEMP_DIR] [--wave | --tail]
[--debug] [--save_images]
[--estimate_energy ESTIMATE_ENERGY]
[--regressor_dir REGRESSOR_DIR]
usage: protopipe-TRAINING [-h] --config_file CONFIG_FILE -o OUTFILE [-m MAX_EVENTS] [-i INDIR] [-f [INFILE_LIST [INFILE_LIST ...]]]
[--cam_ids [CAM_IDS [CAM_IDS ...]]] [--wave_dir WAVE_DIR] [--wave_temp_dir WAVE_TEMP_DIR] [--wave | --tail]
[--debug] [--save_images] [--estimate_energy ESTIMATE_ENERGY] [--regressor_dir REGRESSOR_DIR]
[--regressor_config REGRESSOR_CONFIG]

optional arguments:
-h, --help show this help message and exit
Expand All @@ -40,20 +35,19 @@ By invoking the help argument, you can get help about how the script works:
give a specific list of files to run on
--cam_ids [CAM_IDS [CAM_IDS ...]]
give the specific list of camera types to run on
--wave_dir WAVE_DIR directory where to find mr_filter. if not set look in
$PATH
--wave_dir WAVE_DIR directory where to find mr_filter. if not set look in $PATH
--wave_temp_dir WAVE_TEMP_DIR
directory where mr_filter to store the temporary fits
files
directory where mr_filter to store the temporary fits files
--wave if set, use wavelet cleaning -- default
--tail if set, use tail cleaning, otherwise wavelets
--debug Print debugging information
--save_images Save also all images
--estimate_energy ESTIMATE_ENERGY
Estimate the events' energy with a regressor from
protopipe.scripts.build_model
Estimate the events' energy with a regressor from protopipe.scripts.build_model
--regressor_dir REGRESSOR_DIR
regressors directory
--regressor_config REGRESSOR_CONFIG
Configuration file used to produce regressor model

The configuration file used by this script is ``analysis.yaml``,

Expand Down
Loading