Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migration to Stage1 format #100

Merged
merged 93 commits into from
Dec 9, 2021
Merged

Migration to Stage1 format #100

merged 93 commits into from
Dec 9, 2021

Conversation

TjarkMiener
Copy link
Member

Hi all,

This PR adds the migration to the stage1 format. The code can be tested with the stage1 notebook. I git rebased PR #97 into this branch.

Changes w.r.t. the current (master) reader:

  • Parent class for shared code. Each format has it own subclass with different example identifiers. However, the "output" of the reader (__getitem__ return) is generic though.
  • Option to read/store the example identifiers in a pandas hdf5 file.
  • Different pointing modes supported.
  • Read images and/or image parameters.
  • Read cleaned images.
  • Construct the pyIRF simulated Table.
  • Only have 'mono' and 'stereo' reading mode. The previous multi-stereo mode can be obtained with the stereo mode and selection of different telescope types.

LucaRomanato and others added 30 commits November 26, 2020 08:05
add "image_selection_from_file" parameter. It is a dict where one cut can be stored (at the moment).

If "image_selection_from_file" is present "image_selection" must not be present because the first one is for the new format of h5 file, the second one for the old format.
from Unsigned Int to Signed Int
Use "image_selection" also for the new format in order to use the intensity cut without implement the same filter for image_selection_from_file
delete non used parameter algorithm in the filters and streamline duplicate code
stay consistent
read the selected parameters in the config file (ctlearn config) and append them in the example_identifiers.

Now we have:
('/mnt/lromanato/anaconda3/envs/LST37/CTA/Prod5h5/TJARK/gamma_20deg_180deg_run9___cta-prod5-lapalma_LST1_desert-2158m_LST1_mono_cone6.h5',
  708,
  306,
  1,
  [677.5469970703125, 0.0])

where example_identifiers.append((filename, nrow,image_index, tel_id, temp_list))
temp list = the parameters required for this single image.
(in this example selected_parameters is ['hillas_intensity','leakage_intensity_2']
example:
[{'name': 'image',
  'tel_type': 'LST_LST_LSTCam',
  'base_name': 'image',
  'shape': (114, 114, 2),
  'dtype': dtype('float32')},
 {'name': 'particletype',
  'tel_type': None,
  'base_name': 'shower_primary_id',
  'shape': (),
  'dtype': dtype('int8')},
 {'name': 'parameter',
  'tel_type': None,
  'base_name': 'hillas_intensity',
  'shape': (),
  'dtype': dtype('float32')},
 {'name': 'parameter',
  'tel_type': None,
  'base_name': 'leakage_intensity_2',
  'shape': (),
  'dtype': dtype('float32')}]

renamed in training_parameters because they will be the parameters required for the NN training
In reader.py now i read also the parameters selected (i.e. training_parameters) of the image.

This fix the ctlearn bug of the previus commit
…resent

also tested and fixed for all the combination possible (i.e. both present, no one present, one present and viceversa)
add the col_name to the parameter.
In this way in ctlearn all the parameters selected are present in features.

example:
features:  {'image': <tf.Tensor 'IteratorGetNext:0' shape=<unknown> dtype=float32>, 'parameter_hillas_intensity': <tf.Tensor 'IteratorGetNext:2' shape=<unknown> dtype=float32>, 'parameter_leakage_intensity_2': $tf.Tensor 'IteratorGetNext:3' shape=<unknown> dtype=float32>}
change position attribute and event loop in __getitem__
replace pop () with a non-destructive method
Preparing reader for parameters in stereo mode.
Need to understand which value pass to image_index at line 523
fix a little distraction
new notebook to show how the new reader works. 
Image parameters at the moment can be succesfully loaded only in mono mode and be selected in the yml ctlearn file.
added pyIRF simulation table; added pyirf to setup.py

removed ctapipe from setup.py

removed multi stereo mode annd support this feature in the stereo mode by calling more than one telescope type

parameter selection cuts based on the parameter tables and not on the image itself

added cleaned images

read images and/or parameters

added multiplicity cut on the subarray
Speed ups due to astropy tables

Different pointing modes. Pointing modes over time is not working at the moment due to ctapipe issue 1484 & 1562

Update stage1 dl1 reading notebook
The stage1 tool is compressing the image and peak_time columns to integer values. This commit is converting back to floating point values, if a compression was used.
dl1_data_handler/reader.py Outdated Show resolved Hide resolved
TjarkMiener and others added 21 commits April 9, 2021 14:18
Add ability to write non-MC MAGIC data
Take into account the north pointing

Take the num of showers for pyirf from the shower distribution table. Some stage1 files (very few though) are missing this metadata.
…ression in MAGIC

As discussed in the call, it is better to reconstruct the SrcPosX/Y in the camera for the arrival direction regression. This is a dirty fix for the time being here. We need to come up with a more appropriate solution in ctapipe_io_magic for this regression task (under discussion).
It will automatically detect the split, which was used in the ctapipe-stage1/ctapipe-merge tool and deal with the different 'Group' names. The table itself have the same structure.

get rid of some local variables that occupied some memory

store unshuffled example_identifiers to disk and shuffle them afterwards.

minor bug fix -> read *_TRANSFORM_SCALE from the first table and not from the table tel_001. It broke before, when LST1 is not in the file.
Wrong index was used
This feature can be now used with several processes.
@TjarkMiener
Copy link
Member Author

Hi @nietootein. This PR is pending for a while and it was already used to produce the results for 2109.05809 and 2112.01828. For reproducibility purposes, can we merge this PR and release a new version with known issues (see #104)?

This PR requires CTLearn v0.5.1 #136

@nietootein nietootein merged commit 6fa5949 into master Dec 9, 2021
@nietootein
Copy link
Member

Sorry for the latency, @TjarkMiener. PR merged.

@TjarkMiener TjarkMiener deleted the stage1 branch January 13, 2022 15:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants