Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

faster duplicate_overlapping #69

Merged
merged 1 commit into from
Jan 31, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
/target
benchmarks/target
Cargo.lock
my-prof.profile
Session.vim
Expand Down
10 changes: 5 additions & 5 deletions benches/crit_bench.rs
Original file line number Diff line number Diff line change
Expand Up @@ -17,12 +17,12 @@ const COMPRESSION10MB: &[u8] = include_bytes!("dickens.txt");
const COMPRESSION95K_VERY_GOOD_LOGO: &[u8] = include_bytes!("../logo.jpg");

const ALL: &[&[u8]] = &[
//COMPRESSION1K as &[u8],
//COMPRESSION34K as &[u8],
//COMPRESSION65K as &[u8],
//COMPRESSION66K as &[u8],
COMPRESSION1K as &[u8],
COMPRESSION34K as &[u8],
COMPRESSION65K as &[u8],
COMPRESSION66K as &[u8],
COMPRESSION10MB as &[u8],
// COMPRESSION95K_VERY_GOOD_LOGO as &[u8],
COMPRESSION95K_VERY_GOOD_LOGO as &[u8],
];

fn compress_lz4_fear(input: &[u8]) -> Vec<u8> {
Expand Down
12 changes: 7 additions & 5 deletions src/block/decompress.rs
Original file line number Diff line number Diff line change
Expand Up @@ -57,12 +57,14 @@ unsafe fn duplicate_overlapping(
// To prevent that we write a dummy zero to output, which will zero out output in such cases.
// This is the same strategy used by the reference C implementation https://github.com/lz4/lz4/pull/772
output_ptr.write(0u8);
// Note: this looks like a harmless loop but is unrolled/auto-vectorized by the compiler
for _ in 0..match_length {
let curr = start.read();
output_ptr.write(curr);
*output_ptr = output_ptr.add(1);
let dst_ptr_end = output_ptr.add(match_length);
while (*output_ptr as usize) < dst_ptr_end as usize {
// Note that we copy 4 bytes, instead of one.
// Without that the compiler will unroll/auto-vectorize the copy with a lot of branches.
// This is not what we want, as large overlapping copies are not that common.
core::ptr::copy(start, *output_ptr, 4);
start = start.add(1);
*output_ptr = output_ptr.add(1);
}
}

Expand Down