DL2 to DL3 step implementation #81

morcuended · 2022-01-17T23:22:46Z

Usage: dl3_stage -d 2021_08_08 -c cfg/sequencer.cfg LST1

It makes use of the metadata information extracted from the TCU database (source name and RADec coordinates). (WARNING: a run catalog is to be implemented. This would ease the access to metadata information needed for the DL3 tool).

It pipes the lstchain scripts:

lstchain_create_irf_files (once per selection cuts)
lstchain_create_dl3_file (run-wise)
lstchain_create_dl3_index_files (source-wise)

The idea is to end up with the following structure:

/fefs/aswg/data/real
├── monitoring
│   ├── RunSummary
│   ├── DrivePositioning
│   └── PixelCalibration
├── DL1
│   └── YYYYMMDD
│       └── vX.Y.Z
│           ├── muons.fits
│           └── tailcut84
│               ├── dl1.h5
│               └── datacheck_dl1.h5
├── DL2
│   └── YYYYMMDD
│       └── vX.Y.Z
│           └── tailcut84
│               └── dl2.h5
└── DL3
    └── YYYYMMDD
        └── vX.Y.Z
            └── tailcut84
                ├── std_cuts
                │   ├── irf_std_cuts.fits
                │   ├── source_name1
                │   │   ├── dl3_LST-1_Run00001.fits
                │   │   ├── dl3_LST-1_Run00002.fits
                │   │   ├── hdu-index.fits.gz
                │   │   └── obs-index.fits.gz
                │   └── source_name2
                │       ├── dl3_LST-1_Run00003.fits
                │       ├── dl3_LST-1_Run00004.fits
                │       ├── hdu-index.fits.gz
                │       └── obs-index.fits.gz
                └── other_cuts

Right now this script is intended to be run separately, once closer has been launched and files have been moved to final destinations.

TODO in further PRs:

Refactoring.
DL3 stage should run right after the merging of DL2 files without having to depend on the closer. Sequencer should take care of this analysis step as well through the datasequence.
Implement unit tests.
Next-day high-level analysis.

� Conflicts: � environment.yml � setup.py

codecov · 2022-01-17T23:29:30Z

Codecov Report

Merging #81 (797e31e) into main (cdc88ca) will increase coverage by 0.00%.
The diff coverage is 85.71%.

@@           Coverage Diff           @@
##             main      #81   +/-   ##
=======================================
  Coverage   81.39%   81.40%           
=======================================
  Files          41       41           
  Lines        4294     4296    +2     
=======================================
+ Hits         3495     3497    +2     
  Misses        799      799

Impacted Files	Coverage Δ
osa/scripts/sequencer.py	`86.80% <ø> (ø)`
osa/nightsummary/extract.py	`78.19% <66.66%> (ø)`
osa/configs/datamodel.py	`86.79% <100.00%> (+0.25%)`	⬆️
osa/utils/utils.py	`67.13% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cdc88ca...797e31e. Read the comment docs.

morcuended · 2022-01-19T01:59:40Z

Currently, we may have problems with IERS-A data from astropy (see astropy/astropy#10494). We should download the IERS data separately (keep them up to date on a weekly basis?) and cache that data without trying to download anything when running in the cluster.

Just for reference, I leave here a link on how to proceed with the astropy cache when working in clusters (https://docs.astropy.org/en/stable/utils/data.html#astropy-data-and-clusters)

morcuended · 2022-01-19T19:00:40Z

@moralejo @rlopezcoto @chaimain how does this directory structure scheme for post-DL2 analysis steps sound to you?

rlopezcoto · 2022-01-21T11:59:10Z

Sorry @morcuended , we have been very busy this week with the school. The structure looks overall good, thanks for this taking care of proposing it and for this PR. I just have a few questions:

std_cuts -> have we already defined any set of std_cuts for processing? It may be a good idea to define a few different ones.
In this structure you are introducing for the first time source_name, where are you expecting to get it from?
do we really need to include irf_std_cuts.fits per source? or will they be produced by lstmcpipe and you will only link them there (how are you planning to know which one?)
I guess all this structure will be source-independent analysis, right?

chaimain · 2022-01-21T12:27:34Z

Sorry @morcuended , we have been very busy this week with the school. The structure looks overall good, thanks for this taking care of proposing it and for this PR. I just have a few questions:

std_cuts -> have we already defined any set of std_cuts for processing? It may be a good idea to define a few different ones.

We indeed have to have a discussion on the definition of std_cuts, and probably store this in lstchain first.

In this structure you are introducing for the first time source_name, where are you expecting to get it from?

It seems to be by using a database query from DriveControl_SourceName. We should also discuss this, along with our discussion on creating and maintaining a standard observed source catalog of LST-1.

do we really need to include irf_std_cuts.fits per source? or will they be produced by lstmcpipe and you will only link them there (how are you planning to know which one?)

For IRFs, we are still awaiting the completion of various tasks - producing 'all-sky' MC list, creating the new RF model, DL2 files, merging the IRF interpolation and using energy-dependent cuts in IRF/DL3 Tools. I think it would make sense, only after these tasks, that we can talk on a 'standard' IRF type/s.

morcuended · 2022-01-22T01:49:55Z

Hi @rlopezcoto

After talking with @chaimain I realized that there are still some open issues that are not reflected in this very simplistic scheme. The main one is the selection of proper IRFs for each observation. I'll try to answer your questions below:

First, all this stuff refers to source-independent analysis. This is the only stream lstosa currently does. We may want to indicate this somewhere in the data tree.
No standard cuts defined yet. Just wanted to sketch how it could be in the future. We indeed could go for several sets of cuts (e.g. tight, standard, soft cuts). This is to be discussed and agreed upon within lstchain first as @chaimain says.
Source name associated with a given run_id (as well as source coordinates) is fetched from the TCU database in this scheme (see https://github.com/cta-observatory/lstosa/blob/main/osa/nightsummary/database.py). This is a preliminary version and needs to be discussed too. I'd say that querying this database works for runs taken from about Nov 2020 (no information is consistently there for previous runs). Also, source RADec information seems not to reflect the actual target coordinates but the coordinates plus wobbling offset. So we might want to get this drive info from drive logs instead. Although I think all this information should be centralized. I think from now on, TCU will be writing a Run Catalog having all this information. However for previously taken runs we'll have to figure out this in some other way (e.g. using Create script to merge run summaries with drive logs into a single file cta-lstchain#880). This will need discussion among analysis, TCU and drive teams.

do we really need to include irf_std_cuts.fits per source? or will they be produced by lstmcpipe and you will only link them there (how are you planning to know which one?)

Here I was foreseen to have an IRF file per set of cuts not per source. This could rather be done by lstmcpipe and then we just would look for them. As @chaimain says there are several open points to be considered before we go further in lstosa.

For the moment we could produce IRF-less DL3 files and store them in the same night/date directory (without sorting them by source). We would not run either the observation indexing script for the moment. Or we could not produce DL3 files at all for the time being until previous issues are worked out.

I just wanted to move this forward so we could have automatic & fast next-day high-level results. But I guess that it only makes sense to produce theta2 & significance results for now from DL2 files or these DL3 without IRFs incorporated. Do you think this makes sense @rlopezcoto?

rlopezcoto · 2022-01-24T11:04:22Z

thanks @morcuended and @chaimain, this sounds good for the time being, it would, however, be great if DL3 files could at least be produced for a few sets of cuts (that can be discussed as @chaimain was suggesting).

For the moment we could produce IRF-less DL3 files and store them in the same night/date directory (without sorting them by source). We would not run either the observation indexing script for the moment. Or we could not produce DL3 files at all for the time being until previous issues are worked out.

what is the problem with running the observation indexing script?

chaimain · 2022-01-24T11:24:13Z

what is the problem with running the observation indexing script?

Sorry, I just checked again, and there should be no problem with running the indexing Tool for IRF-less DL3 files.

Also, we will try and merge the PR in lstchain for using energy-dependent cuts, so we can have a better definition on the types of cuts we apply, based on the gamma efficiency for each energy bin we define.

morcuended · 2022-01-24T17:20:25Z

Sorry, I just checked again, and there should be no problem with running the indexing Tool for IRF-less DL3 files.

Then we will do it like this. No IRFs but we do index the files.

For the time being, I will test the production of DL3 with fixed cuts then we will move to energy-dependent ones.

rlopezcoto · 2022-01-25T07:54:35Z

Great, thanks guys!

chaimain · 2022-01-31T13:35:40Z

Hi @morcuended, after discussing with @maxnoe regarding IRF-less DL3 production, we should not create DL3 files without IRFs, as it provides no additional information over the DL2 files.
It would be better, to just create DL3 files, with the existing IRFs (Maybe point-like IRFs for now) daily with lstosa.

The lstchain DL3 Tool, will be fixed in #709 to require IRFs, and take the gammaness cut information from the provided IRFs only, be it global cut or energy-dependent cuts. It will be available in the upcoming lstchain release.

So, maybe you should close #94

morcuended · 2022-01-31T17:01:36Z

Hi @chaimain. Thanks for letting me know. I will close #94. Have you discussed the "standard" creation of DL3 files? We go for point-like, several sets of cuts (fixed or energy-dependent)?

chaimain · 2022-01-31T17:05:18Z

No, I have not yet discussed the "standard" cuts and type of IRFs to be produced. Will do it on Slack now, and if need be, open a github issue.

morcuended added 7 commits January 16, 2022 18:08

initial skeleton of the dl3 stage script

ac0b037

add mc for IRF creation to cfg

bda004a

add DL3 concept for data directory creation

556354d

add click package to requirements

c3c4c8e

Merge branch 'main' of github.com:cta-observatory/lstosa into dl3

6b5cbe2

� Conflicts: � environment.yml � setup.py

unused import

33cf5f0

remove debugging definitions

6bf89db

morcuended added 15 commits January 17, 2022 16:45

add config option

2615223

source name and coordinates to datamodel

2f0f4c0

added config option to dl3 script

50c7f58

run number as int

586084b

fix source name dumping

287e717

make sure that there are sources

431a0a9

paths for dl3 data

cec44d1

paths for dl3 log files

fbcf6ae

fix paths for dl3 log files in sbatch cmd

27f62cd

missing coma

6eb2389

running dl3 jobs per run and source

d982a38

add functions to __all__ and fix logging message

338dd53

fix dirs dl3

53123e0

fix source name attribute of sequence obj

d12d480

add deg unit to source coordinates in dl3 command

bd3e354

morcuended added 3 commits January 18, 2022 20:42

try to fix too wide logger

f570437

paths index obs dl3

cbd8298

paths index obs dl3

821f6cf

add dl3 entrypoint

797e31e

morcuended merged commit d13d192 into main Jan 19, 2022

morcuended deleted the dl3 branch January 19, 2022 19:19

This was referenced Jan 19, 2022

Add DL2 to DL3 step #9

Closed

DL3 stage #84

Open

morcuended mentioned this pull request Jan 31, 2022

Add option to be able to not add IRF to DL3 files #94

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DL2 to DL3 step implementation #81

DL2 to DL3 step implementation #81

morcuended commented Jan 17, 2022 •

edited

Loading

codecov bot commented Jan 17, 2022 •

edited

Loading

morcuended commented Jan 19, 2022 •

edited

Loading

morcuended commented Jan 19, 2022 •

edited

Loading

rlopezcoto commented Jan 21, 2022

chaimain commented Jan 21, 2022 •

edited

Loading

morcuended commented Jan 22, 2022

rlopezcoto commented Jan 24, 2022

chaimain commented Jan 24, 2022 •

edited

Loading

morcuended commented Jan 24, 2022

rlopezcoto commented Jan 25, 2022

chaimain commented Jan 31, 2022

morcuended commented Jan 31, 2022

chaimain commented Jan 31, 2022

DL2 to DL3 step implementation #81

DL2 to DL3 step implementation #81

Conversation

morcuended commented Jan 17, 2022 • edited Loading

codecov bot commented Jan 17, 2022 • edited Loading

Codecov Report

morcuended commented Jan 19, 2022 • edited Loading

morcuended commented Jan 19, 2022 • edited Loading

rlopezcoto commented Jan 21, 2022

chaimain commented Jan 21, 2022 • edited Loading

morcuended commented Jan 22, 2022

rlopezcoto commented Jan 24, 2022

chaimain commented Jan 24, 2022 • edited Loading

morcuended commented Jan 24, 2022

rlopezcoto commented Jan 25, 2022

chaimain commented Jan 31, 2022

morcuended commented Jan 31, 2022

chaimain commented Jan 31, 2022

morcuended commented Jan 17, 2022 •

edited

Loading

codecov bot commented Jan 17, 2022 •

edited

Loading

morcuended commented Jan 19, 2022 •

edited

Loading

morcuended commented Jan 19, 2022 •

edited

Loading

chaimain commented Jan 21, 2022 •

edited

Loading

chaimain commented Jan 24, 2022 •

edited

Loading