Change gzip compression level from 9 to 6 in mergetrigs #3428

josh-willis · 2020-08-20T15:53:33Z

We have recently had several workflows that spend an inordinate amount of time in the hdf_mergetrigs jobs on full data. In some cases they ran for 24 hours and got evicted. @titodalcanton noticed that these jobs were spending much more time writing some of the outputs than others, and that if he condor_ssh'd to the job, it was actually in the R state using 100% of the CPU, so it seemed that the time was spent in actual computation, and not file system issues. On further investigation of an example file, if we ignore the datasets which correspond to statistics not actually calculated in a given workflow (e.g., bank chi-squared), then the greater the compression factor achieved, the longer it took to write that dataset, strongly suggesting that for datasets which were compressible (but not trivially compressible, like bank chi-squared---those ran quickly) the time was being spent in the compression.

Some reading around on the web suggested that our current compression level of 9 in pycbc_coinc_mergetrigs is very aggressive. A more typical default is 6, and this is what you get if you call gzip from the command line and don't specify a compression level. I ran a test last night with that change, on the largest merged trigger file from several recent workflows I'd run. With a level of 6, its size on disk was 24 G, whereas with 9, it was 23 G. However the job completed in an order of magnitude less time.

This is marked as work in progress, because after some discussion with Tito, he's going to add a follow-up commit to actually create a command-line option to specify this level, and default it to six. But we want to write it up so @ahnitz has a chance to give us feedback.

ahnitz · 2020-08-20T15:55:52Z

@josh-willis I have no objections to this. I think I only chose 9 because at the time the computing cost wasn't significant, but I was worried about getting the file size as small as possible so just took it to the extreme.

spxiwh

I'll give this approval, but also happy for this to be exposed as a command-line argument for better control!

titodalcanton · 2020-08-20T16:37:05Z

I ran a few tests with different compression levels, this can be merged.

titodalcanton · 2020-08-20T17:30:48Z

Here is a quick evaluation of all compression levels, done by merging just a couple files. Probably not enough to see big differences. Reading from NFS, writing to /local. Timing is done just from a single run, so take it with a grain of salt.

lpsinger · 2020-08-20T17:44:37Z

Have you tried zstd? You'll get good compression ratios much faster.

ahnitz · 2020-08-20T17:53:51Z

@lpsinger That's a good compressor, but not supported by default in hdf as far as I can tell. h5py/h5py#611.

I.E. requires rebuilding hdf5 from source. Is that included in most hdf packages?

* Change default gzip compression level from 9 to 6 in mergetrigs * Expose compression as CLI option Co-authored-by: Tito Dal Canton <dalcanton@lal.in2p3.fr>

Change default gzip compression level from 9 to 6 in mergetrigs

e3dc8eb

josh-willis added the work in progress label Aug 20, 2020

josh-willis assigned ahnitz and spxiwh Aug 20, 2020

Expose compression as CLI option

2db7f19

spxiwh approved these changes Aug 20, 2020

View reviewed changes

titodalcanton removed the work in progress label Aug 20, 2020

ahnitz approved these changes Aug 20, 2020

View reviewed changes

josh-willis merged commit afa358b into gwastro:master Aug 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change gzip compression level from 9 to 6 in mergetrigs #3428

Change gzip compression level from 9 to 6 in mergetrigs #3428

josh-willis commented Aug 20, 2020

ahnitz commented Aug 20, 2020

spxiwh left a comment

titodalcanton commented Aug 20, 2020

titodalcanton commented Aug 20, 2020

lpsinger commented Aug 20, 2020

ahnitz commented Aug 20, 2020

Change gzip compression level from 9 to 6 in mergetrigs #3428

Change gzip compression level from 9 to 6 in mergetrigs #3428

Conversation

josh-willis commented Aug 20, 2020

ahnitz commented Aug 20, 2020

spxiwh left a comment

Choose a reason for hiding this comment

titodalcanton commented Aug 20, 2020

titodalcanton commented Aug 20, 2020

lpsinger commented Aug 20, 2020

ahnitz commented Aug 20, 2020