Skip to content

Commit

Permalink
[performance] Update initially non-compressed inserted end-of-chunk w…
Browse files Browse the repository at this point in the history
…indows with compressed sparse versions

m rapidgzip && src/tools/rapidgzip -P 1 --index-format gztool \
    --export-index base64-512MiB.gz{.index,}
stat -c %s base64-512MiB.gz.inde

Before: 2463522 B
After :   88724 B
  • Loading branch information
mxmlnkn committed May 20, 2024
1 parent 3b41f6e commit d7d4bcd
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 11 deletions.
5 changes: 4 additions & 1 deletion src/rapidgzip/ChunkData.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -312,7 +312,9 @@ struct ChunkData :
statistics.computeChecksumDuration += duration( tApplyEnd );
}

/* Replace markers in subchunk windows and compress the resulting fully-resolved window. */
/* Replace markers in and compress the resulting fully-resolved window provided by each subchunk,
* i.e., at the end of each subchunk. In benchmarks with random base64 data and ISA-L, this takes
* roughly 0.5 ms per 32 KiB window (0.048s for 97 compressed windows). */
const auto tWindowCompressionStart = now();
size_t decodedOffsetInBlock{ 0 };
for ( auto& subchunk : m_subchunks ) {
Expand Down Expand Up @@ -421,6 +423,7 @@ struct ChunkData :
crc32s.emplace_back();
crc32s.back().setEnabled( wasEnabled );
}

void
setCRC32Enabled( bool enabled )
{
Expand Down
21 changes: 13 additions & 8 deletions src/rapidgzip/GzipChunkFetcher.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -431,16 +431,21 @@ class GzipChunkFetcher final :
for ( const auto& subchunk : subchunks ) {
decodedOffsetInBlock += subchunk.decodedSize;
const auto windowOffset = subchunk.encodedOffset + subchunk.encodedSize;
/* Avoid recalculating what we already emplaced in waitForReplacedMarkers when calling getLastWindow. */
if ( !m_windowMap->get( windowOffset ) ) {
if ( subchunk.window ) {
/* Explicitly reinsert what we already emplaced in waitForReplacedMarkers when calling getLastWindow,
* but now the window shold be compressed with sparsity applied! Thanks to the WindowMap being locked
* and the windows being shared pointers, this should lead to no bugs, and the consistency check in
* the WindowMap is also long gone, i.e., overwriting windows is allowed and now a required feature. */
const auto existingWindow = m_windowMap->get( windowOffset );
if ( subchunk.window ) {
/* Do not overwrite empty windows signaling windows that are not required at all. */
if ( !existingWindow || !existingWindow->empty() ) {
m_windowMap->emplaceShared( windowOffset, subchunk.window );
} else {
m_windowMap->emplace( windowOffset, chunkData->getWindowAt( lastWindow, decodedOffsetInBlock ),
chunkData->windowCompressionType() );
std::cerr << "[Info] The subchunk window for offset " << windowOffset << " is not compressed yet. "
<< "Compressing it now might slow down the program.\n";
}
} else if ( !existingWindow ) {
m_windowMap->emplace( windowOffset, chunkData->getWindowAt( lastWindow, decodedOffsetInBlock ),
chunkData->windowCompressionType() );
std::cerr << "[Info] The subchunk window for offset " << windowOffset << " is not compressed yet. "
<< "Compressing it now might slow down the program.\n";
}
}

Expand Down
4 changes: 2 additions & 2 deletions src/rapidgzip/WindowMap.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -66,8 +66,8 @@ class WindowMap
* index!
* Further windows might also be inserted if the file is opened in a buffered manner, which could
* insert windows up to the buffer size without having read anything yet.
* Comparing the decompressed contents might also fail in the future when support for sparse windows
* is added.
* Comparing the decompressed contents will also fail when overwriting non-compressed windows
* with asynchronically compressed and made-sparse windows.
* I am not even sure anymore why I did want to test for changes. I guess it was a consistency check,
* but it becomes too complex and error-prone now. */
m_windows.insert_or_assign( m_windows.end(), encodedBlockOffset, std::move( sharedWindow ) );
Expand Down

0 comments on commit d7d4bcd

Please sign in to comment.