Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce RAM usage for reading ctapipe-stage1 files #104

Open
TjarkMiener opened this issue Sep 1, 2021 · 5 comments
Open

Reduce RAM usage for reading ctapipe-stage1 files #104

TjarkMiener opened this issue Sep 1, 2021 · 5 comments

Comments

@TjarkMiener
Copy link
Member

Following up on the discussion in ctapipe PR 1730.

I noticed a significant difference in the RAM usage during training time between the ctapipe-stage1 DL1 files, which are split by tel_ids and the one split by tel_types.

The problem can be broken down to the lines L792-L800 and L827-L837 of the DL1DataReaderSTAGE1 in the stage1 branch. The core difference is that for the DL1 files split by tel_types, we only call TABLE._f_get_child() once in L830, outside the for loop over the telescope ids, so only one child is in memory. My (probably poorly) pytables understanding here is that getting a child from a node and then using pytables indexing is different than opening the whole table, storing the whole table temporarily in memory and retrieving the image by an index. @aribrill, can you please comment?

As suggested by @maxnoe earsing the for loop when retreiving event images would be more efficient! I would need to check if multi-indexing on a child is supported by pytables. This feature is only applicable to the DL1 files split by tel_types, because for the DL1 files split by tel_ids a for loop is required because we have multiple childs/tables. The purpose of this function call ( __getitem__(self, idx)) is to retrieve information (charges, hillas parameters, simulation properties etc.) for a single event for a short time (forward pass of the network). This function is called continually during the training process on the whole DL1 production to build batches of event images. The batches are shuffled in a sense that gamma and proton events from different runs (in particular different DL1 files) are contained within those batches.

@maxnoe
Copy link
Member

maxnoe commented Sep 1, 2021

for a single event for a short time (forward pass of the network).

I don't think you should read single events from file for that. Load tables with largish number of events that fit into RAM (configurable chunksize, like 10k to 50k events) and yield them one by one (without going back to the file) when the network needs a new batch. Read a new chunk when the current one is exhausted.

We are planning on adding such chunked reading to the table reader API and since the last release ctapipe.io.read_table already supports start, stop, step arguments.

@TjarkMiener
Copy link
Member Author

I don't think you should read single events from file for that. Load tables with largish number of events that fit into RAM (configurable chunksize, like 10k to 50k events) and yield them one by one (without going back to the file) when the network needs a new batch. Read a new chunk when the current one is exhausted.

Interesting point!

We are planning on adding such chunked reading to the table reader API and since the last release ctapipe.io.read_table already supports start, stop, step arguments.

When I implemented the DL1DataReaderSTAGE1 this feature was not available. I don't think it makes much sense to change the current implementation and replace the "_get_child() indexing" with ctapipe.io.read_table with the start and stop argument for single events. The RAM issue should persist both ways judging from the read_table implementation L71-L75. However, chunked reading with shuffled data files would be very interesting for us to adopt!

@aribrill
Copy link
Collaborator

aribrill commented Sep 9, 2021

I don't think you should read single events from file for that. Load tables with largish number of events that fit into RAM (configurable chunksize, like 10k to 50k events) and yield them one by one (without going back to the file) when the network needs a new batch. Read a new chunk when the current one is exhausted.

DL1DH is designed to load single events for the use case of training on simulated data files. The reason is randomization. First, @TjarkMiener mentioned, events of different particle types are contained in separate files, so every batch of events needs to contain events from different files (in general). We need to select events from across the dataset (as opposed to contiguous events from the same file) because contiguous simulated events can be correlated, due to multiple events being generated based on the same simulated shower being placed at different locations.

Reading chunks into RAM as @maxnoe described would be fine for prediction, and would presumably be much more efficient for that use case. Right now, since DL is in the R&D phase, the computing time is dominated by training (although it should be noted that a non-negligible fraction of that time is predicting on the validation set), and it's not clear to me if it would be worth it right now to complicate the code by having separate data reading methods for training and prediction. If we want DL to eventually become a useful tool for analyzing real data, efficient prediction will be required at some point, though.

Chunk-based reading could be used for training if the data were pre-randomized at write time. Some issues/caveats I can think of with this (other than the obvious, that the files we have aren't in fact randomized) are:

  1. We would lose the flexibility of keeping different particle types separate, e.g. choosing whether to include electrons in training or looking only at gammas for an energy/angular reconstruction task. We would have to choose this at write time.
  2. The traceability of which simulation files went into which DL1 files would be lost.
  3. We would have to take care during the pre-randomization process to avoid any mixing between the simulation files used for training and testing. Otherwise, you could get data leakage (correlated events in the training and test sets).

My (probably poorly) pytables understanding here is that getting a child from a node and then using pytables indexing is different than opening the whole table, storing the whole table temporarily in memory and retrieving the image by an index.

This is my understanding as well. Do we know the chunk size of the files? If it's very large, could that be the issue?

@kosack
Copy link

kosack commented Sep 10, 2021

@aribrill thanks, that is much more clear: random single-event reading was not a use case I had considered, but in fact it should work ok with the current format.

One think you might want to look into is adding pytables indices to the files and see if that improves your performance - it should speed up random event lookup by event_id, but if you are doing it by event index only, maybe there is no strong effect. You can enabling the DataWriter.write_index_tables option in ctapipe-process when writing the data to automatically add index tables, or just open the files and use the pytables methods to add them afterwards manually (e.g. table.cols.var1.create_index()).

@TjarkMiener TjarkMiener added this to the v0.11.0 milestone Dec 10, 2021
@TjarkMiener
Copy link
Member Author

Thanks for the suggestions @kosack! I will try out to write the index tables afterward (for the merged files I always had that boolean on True).

@maxnoe As of CTLearn v0.6.1 we support a pseudo-chunkwise processing at prediction phase. CTLearn v0.6.1 also converts the trained keras model into onnx format, which makes the interfacing with ctapipe much easier.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants