Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added support for the legacy compressions (shrink/reduce/implode). #120

Open
wants to merge 67 commits into
base: master
Choose a base branch
from

Conversation

mkrueger
Copy link

For a project that will encounter old .zip files I need a rust library that supports shrink/reduce and implode.
I didn't find any so I ported the hw-zip 2.2 implementations to rust: https://www.hanshq.net/zip.html

Copy link
Member

@Pr0methean Pr0methean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the idea behind this PR, but the implementation isn't idiomatic Rust and wouldn't be very inefficient. I may have more changes to request once these are done.

src/compression.rs Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/reduce.rs Outdated Show resolved Hide resolved
src/legacy/shrink.rs Outdated Show resolved Hide resolved
src/legacy/shrink.rs Outdated Show resolved Hide resolved
Comment on lines 67 to 69
struct Codetab {
prefix_code: u16, // INVALID_CODE means the entry is invalid.
ext_byte: u8,
len: u16,
last_dst_pos: usize,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
struct Codetab {
prefix_code: u16, // INVALID_CODE means the entry is invalid.
ext_byte: u8,
len: u16,
last_dst_pos: usize,
}
#[repr(packed)]
struct Codetab {
last_dst_pos: usize,
prefix_code: Option<u16>,
ext_byte: u8,
len: u16
}

This should fit into 128 bits on 64-bit platforms and 96 bits on 32-bit platforms, all with no penalty for an unaligned access.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rust doesn't like that - I get errors. Why repr(packed) ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents any alignment padding. Usually scalar values are aligned on their size, and arrays on their element size.

src/legacy/shrink.rs Outdated Show resolved Hide resolved
@Pr0methean Pr0methean added Please address review comments Some review comments are still open. Please fix failing tests Tests are failing with this change; please fix them. labels May 13, 2024
@mkrueger
Copy link
Author

mkrueger commented May 13, 2024

Wow - thanks for the code review :). Will look into it.

So I tried to go through the comments. Maybe I've forgotton some - there were many.

For short: I've looked into bitstream-io a bit. I'm not sure if it's worth switching to it. It just pulls in another dependency for just ~60 LOC or so. A different thing is the huffman implementation IMO.

I'll try to search for a huffman crate that can be used. bitstream-io contains some huffman stuff as well so it may be worth using - but haven't looked really into it.
My main goal for that was compatibility - I've some shrink/implode .zips in my bbs archive huge test case and I was able to extract those. Not sure if it's really worth putting way more time into exploring to use 3rd party libraries for that.
Esp. given in the amount of code required.
Code cleanup & use rust features - for sure. C code is quite ugly - even if I've a very good implementation from hwzip.

@mkrueger mkrueger force-pushed the legacy-zip branch 2 times, most recently from e35d857 to a2f1236 Compare May 13, 2024 08:45
src/legacy/reduce.rs Outdated Show resolved Hide resolved
src/legacy/reduce.rs Outdated Show resolved Hide resolved
Comment on lines 67 to 69
struct Codetab {
prefix_code: u16, // INVALID_CODE means the entry is invalid.
ext_byte: u8,
len: u16,
last_dst_pos: usize,
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It prevents any alignment padding. Usually scalar values are aligned on their size, and arrays on their element size.

src/compression.rs Outdated Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/reduce.rs Outdated Show resolved Hide resolved
src/legacy/reduce.rs Show resolved Hide resolved
@Pr0methean Pr0methean added the major This >~1000 SLOC PR is likely to cause hard-to-resolve conflicts with other open PRs once merged. label May 14, 2024
Copy link

@zmodem zmodem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey, hwzip author here :)

This is great to see! I'm only a Rust newbie, so probably can't help much with the review, but am happy to answer any questions about the original code.

src/legacy/huffman.rs Outdated Show resolved Hide resolved
Copy link
Member

@Pr0methean Pr0methean left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I managed to get through over half of the PR this time!

src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/bitstream.rs Outdated Show resolved Hide resolved
src/legacy/huffman.rs Outdated Show resolved Hide resolved
self.sentinel_bits[len] = s;
debug_assert!(self.sentinel_bits[len] >= code[len] as u32, "No overflow!");
sym_idx[len] = sym_idx[len - 1] + count[len - 1];
self.offset_first_sym_idx[len] = sym_idx[len].wrapping_sub(code[len]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this wrapped around, would it indicate a bug? If so, use - instead of wrapping_sub (behaves the same in release builds but is checked for overflow in debug builds).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know for sure at least unit tests fail.

src/legacy/huffman.rs Outdated Show resolved Hide resolved
src/legacy/huffman.rs Outdated Show resolved Hide resolved
ImplodeDecoder {
compressed_reader: inner,
uncompressed_size,
stream_read: false,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's usually not the custom Read impl that decides whether or not to chunk -- those who want chunking will wrap this in a BufReader, and it's best to be prepared for that eventuality even if it's unlikely.

src/legacy/implode.rs Show resolved Hide resolved
src/legacy/reduce.rs Outdated Show resolved Hide resolved
src/legacy/reduce.rs Outdated Show resolved Hide resolved
mkrueger and others added 17 commits May 14, 2024 21:41
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Pr0methean and others added 11 commits May 19, 2024 21:04
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
src/legacy/shrink.rs Outdated Show resolved Hide resolved
mkrueger and others added 4 commits May 22, 2024 12:47
Co-authored-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Mike Krüger <mikkrg@microsoft.com>
@Pr0methean
Copy link
Member

cargo clippy --all-targets --all-features is failing; please fix.

Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
Signed-off-by: Chris Hennick <4961925+Pr0methean@users.noreply.github.com>
@Pr0methean
Copy link
Member

Pr0methean commented Jul 20, 2024

This PR probably needs to have its rebase manually adjusted due to #210.

@Pr0methean Pr0methean added Please fix rebase conflicts Please rebase this PR against master and fix the conflicts. and removed Please fix failing tests Tests are failing with this change; please fix them. labels Jul 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
major This >~1000 SLOC PR is likely to cause hard-to-resolve conflicts with other open PRs once merged. Please address review comments Some review comments are still open. Please fix rebase conflicts Please rebase this PR against master and fix the conflicts.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants