Skip to content

Commit

Permalink
[version] Bump rapidgzip version to 0.14.0
Browse files Browse the repository at this point in the history
  • Loading branch information
mxmlnkn committed May 21, 2024
1 parent 6e8fb1c commit 8c33da0
Show file tree
Hide file tree
Showing 3 changed files with 44 additions and 3 deletions.
41 changes: 41 additions & 0 deletions python/rapidgzip/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,45 @@

# Version 0.14.0 built on 2024-05-21

## Added

- Add support for reading and writing indexes created with gztool.
- Add `--ranges` option to output specified byte and line ranges.
- Add command line options `--sparse-windows` and `--no-sparse-windows` to explicitly control window sparsity.

## Performance

- Avoid unnecessary windows being created during bzip2 decompression to reduce memory usage vastly
and also to increase decompression speed by ~10% from 344 MB/s to 388 MB/s.
- Do not store windows for BGZF files as they are not needed. This reduces memory usage from 32 KiB
down to ~16 B per seek point.
- Optimize reading many very small (several bytes) gzip streams to be on par with igzip: 0.5 - 7.0 MB/s.
- Use sparsity information of window to increase compression. This reduces memory size of the in-memory
index and also exported gztool indexes. For wikidata, the index size is reduced 3x smaller.
- Reduce memory footprint further by compressing the windows with in a zlib stream instead of gzip stream.
- Also compress and apply sparseness for windows at chunk borders to further reduce the memory footprint.
- Disable custom vector implementation that was meant to skip the overhead of initializing the contents
that is to be overwritten anyway because it lead to higher memory usage (peak RSS) for yet unknown reasons.

## API

- Add `FileReader::seekTo` method to reduce narrowing warnings for the offset.
- Add option to configure the `BitReader` byte buffer size in the constructor.

## Fixes

- Fix lots of style checker warnings and CI issues.
- Detection for seeking inside the `BitReader` byte buffer did only work for the first 12.5% because
of a missing byte to bit conversion.
- Seeking inside the `BitReader` byte buffer after reading directly from the underlying file did result
in a wrong seek position. This is very hard to trigger in earlier versions because of the above bug
and because this call combination was rarely done.
- Multiple exception were not actually thrown, only constructed. Found via clang-tidy after fixing all
the false positives that hid these actual bugs.
- Do not erroneously warn about useless index import when specifying an index export path and parallelization is 1.
- Specifying an empty file path did show a seeking error instead of a helpful message.


# Version 0.13.3 built on 2024-04-27

## Fixes
Expand Down
2 changes: 1 addition & 1 deletion python/rapidgzip/setup.cfg
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[metadata]
name = rapidgzip
version = 0.13.3
version = 0.14.0

description = Parallel random access to gzip files
url = https://github.com/mxmlnkn/rapidgzip
Expand Down
4 changes: 2 additions & 2 deletions src/rapidgzip/rapidgzip.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@


static constexpr uint32_t RAPIDGZIP_VERSION_MAJOR{ 0 };
static constexpr uint32_t RAPIDGZIP_VERSION_MINOR{ 13 };
static constexpr uint32_t RAPIDGZIP_VERSION_PATCH{ 3 };
static constexpr uint32_t RAPIDGZIP_VERSION_MINOR{ 14 };
static constexpr uint32_t RAPIDGZIP_VERSION_PATCH{ 0 };
static constexpr uint32_t RAPIDGZIP_VERSION{
RAPIDGZIP_VERSION_MAJOR * 0x10000UL + RAPIDGZIP_VERSION_MINOR * 0x100UL + RAPIDGZIP_VERSION_PATCH
};
Expand Down

0 comments on commit 8c33da0

Please sign in to comment.