Skip to content

Commit d935be5

Browse files
authored
add section on writing mzML.gz files
1 parent d393670 commit d935be5

File tree

1 file changed

+12
-6
lines changed

1 file changed

+12
-6
lines changed

docs/getting-started/types-of-topp-tools/file-handling.md

Lines changed: 12 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -35,18 +35,24 @@ formats of the input and output file can be given explicitly.
3535

3636
## Compression of mzML files
3737

38-
TOPP tools now support writing compressed .mzML.gz files for efficient storage. For example, PeakPickerHiRes can output compressed files:
38+
39+
OpenMS has supported **reading** of compressed mzML, mzXML, and mzData for a long time.
40+
41+
Since OpenMS 3.5, TOPP tools that produce mzML output files also support **writing** compressed gzipped `.mzML.gz` files.
42+
To enable compression, simply use `mzML.gz` instead of `.mzML` as the output filename.
43+
44+
For example, PeakPickerHiRes can output compressed files like this:
3945

4046
`PeakPickerHiRes -in input.mzML -out output.mzML.gz -threads 8`
4147

42-
Compression uses pigz (parallel gzip) if installed for faster performance, falling back to OpenMS's internal compression mechanism otherwise. When using pigz, OpenMS limits threads to the user-specified value (e.g., -threads 8) via omp_get_max_threads(), ensuring compatibility with cluster schedulers. Install pigz for optimal speed.
48+
Compression uses the `pigz` (parallel gzip) tool, if installed, or falls back to OpenMS's internal compression mechanism otherwise. `pigz` offers faster compression speed, even if only using one thread. The number of threads used for compression is determined by the usual `-threads <n>` flag of the TOPP tool.
49+
Without pigz, the internal gzip compressor is used, which only supports a single thread, irrespective of the value given in `-threads <n>`.
50+
4351

44-
Trade-offs:
52+
compression efficiency: `.mzML.gz` files are typically 2-3x smaller
53+
compression speed: `pigz` is significantly faster than the internal compression. Install `pigz` if possible (it's available via the usual package managers),
4554

46-
Efficiency: .mzML.gz files are 2-3x smaller; pigz is significantly faster but CPU-intensive.
47-
Compatibility: Ensure downstream tools support .mzML.gz.
4855

49-
This feature supports indexed mzML and enhances data management.
5056

5157
## Converting between DTA and mzML
5258

0 commit comments

Comments
 (0)