idencomp (jap. 遺伝コンプレッサー (idenkonpuressa) — "genetic compressor") is an attempt on building a compression tool for genetic data (precisely, for FASTQ files). The goal is beat the performance of most commonly used tools, while maintaining a decent compression ratio.
This is based on several building blocks:
- context binning and k-means model clustering
- rANS entropy coder
- Deflate and Brotli (compressing sequence names)
The compressor has been built with modern multicore CPUs in mind and can utilize multiple cores/threads for all the critical parts. It contains a CLI interface and an accompanying Rust library.
The project is licensed under the MIT license.
Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the project by you shall be licensed as MIT, without any additional terms or conditions.
We encourage contributors to use predefined pre-commit
hooks — to install them in your local repo, make sure you have pre-commit
installed and run:
pre-commit install