Skip to content

Commit

Permalink
[performance] Update initially non-compressed inserted end-of-chunk w…
Browse files Browse the repository at this point in the history
…indows with compressed sparse versions

m rapidgzip && src/tools/rapidgzip -P 1 --index-format gztool \
    --export-index base64-512MiB.gz{.index,}
stat -c %s base64-512MiB.gz.inde

Before: 2463522 B
After :   88724 B
  • Loading branch information
mxmlnkn committed May 19, 2024
1 parent 735ff97 commit e99c6d2
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 12 deletions.
3 changes: 2 additions & 1 deletion src/rapidgzip/ChunkData.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -315,7 +315,8 @@ struct ChunkData :
}

/* Replace markers in and compress the resulting fully-resolved window provided by each subchunk,
* i.e., at the end of each subchunk. */
* i.e., at the end of each subchunk. In benchmarks with random base64 data and ISA-L, this takes
* roughly 0.5 ms per 32 KiB window (0.048s for 97 compressed windows). */
const auto tWindowCompressionStart = now();
size_t decodedOffsetInBlock{ 0 };
for ( auto& subchunk : m_subchunks ) {
Expand Down
23 changes: 14 additions & 9 deletions src/rapidgzip/GzipChunkFetcher.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -423,17 +423,22 @@ class GzipChunkFetcher final :
for ( const auto& subchunk : subchunks ) {
/* Compute offset of the window >provided< by this subchunk, not the window >required< by this subchunk. */
const auto windowOffset = subchunk.encodedOffset + subchunk.encodedSize;
/* Avoid recalculating what we already emplaced in waitForReplacedMarkers when calling getLastWindow. */
if ( !m_windowMap->get( windowOffset ) ) {
if ( subchunk.window ) {
/* Explicitly reinsert what we already emplaced in waitForReplacedMarkers when calling getLastWindow,
* but now the window shold be compressed with sparsity applied! Thanks to the WindowMap being locked
* and the windows being shared pointers, this should lead to no bugs, and the consistency check in
* the WindowMap is also long gone, i.e., overwriting windows is allowed and now a required feature. */
const auto existingWindow = m_windowMap->get( windowOffset );
if ( subchunk.window ) {
/* Do not overwrite empty windows signaling windows that are not required at all. */
if ( !existingWindow || !existingWindow->empty() ) {
m_windowMap->emplaceShared( windowOffset, subchunk.window );
} else {
const auto nextDecodedWindowOffset = subchunk.decodedOffset + subchunk.decodedSize;
m_windowMap->emplace( windowOffset, chunkData->getWindowAt( lastWindow, nextDecodedWindowOffset ),
chunkData->windowCompressionType() );
std::cerr << "[Info] The subchunk window for offset " << windowOffset << " is not compressed yet. "
<< "Compressing it now might slow down the program.\n";
}
} else if ( !existingWindow ) {
const auto nextDecodedWindowOffset = subchunk.decodedOffset + subchunk.decodedSize;
m_windowMap->emplace( windowOffset, chunkData->getWindowAt( lastWindow, nextDecodedWindowOffset ),
chunkData->windowCompressionType() );
std::cerr << "[Info] The subchunk window for offset " << windowOffset << " is not compressed yet. "
<< "Compressing it now might slow down the program.\n";
}
}

Expand Down
4 changes: 2 additions & 2 deletions src/rapidgzip/WindowMap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ class WindowMap
* index!
* Further windows might also be inserted if the file is opened in a buffered manner, which could
* insert windows up to the buffer size without having read anything yet.
* Comparing the decompressed contents might also fail in the future when support for sparse windows
* is added.
* Comparing the decompressed contents will also fail when overwriting non-compressed windows
* with asynchronically compressed and made-sparse windows.
* I am not even sure anymore why I did want to test for changes. I guess it was a consistency check,
* but it becomes too complex and error-prone now. */
m_windows.insert_or_assign( m_windows.end(), encodedBlockOffset, std::move( sharedWindow ) );
Expand Down

0 comments on commit e99c6d2

Please sign in to comment.