Remove the check on dictionary capacity #52

bodgit · 2022-12-11T20:39:59Z

I'm using your xz package in my 7-zip reader package to decompress LZMA and LZMA2 streams. I've had some reports of some archives being unreadable with a lzma: dictionary capacity too small error.

I thought I could override the default ReaderConfig and set a bigger dictionary capacity however it seems the check that triggers this error fires before the capacity is overridden by the ReaderConfig.DictCap value. The default DictCap value is much bigger than MinDictCap anyway, (8 MB vs 4 KB IIRC).

Removing this check allows the ReaderConfig.DictCap value to override whatever was in the stream header and it's guaranteed to be at least MinDictCap thanks to ReaderConfig.Verify().

The unreadable LZMA streams were then readable.

The `ReaderConfig.DictCap` value is already range checked and guaranteed to be at least `MinDictCap` so the archive will always have a dictionary capacity of `MinDictCap` if a smaller value was in the header.

ulikunitz · 2022-12-11T22:10:07Z

Thank you for bringing that to my attention. Returning an error is indeed wrong, because the LZMA specification clearly states that a value smaller than 4096 byte should be set to 4096 byte. Since I want to do what the spec says I will not apply your pull request, but make sure that your approach of overriding the dictionary size works. I will also add a test case. You can expect a v0.5.11 in the next week. (I will also change the logic in my newer rewrite branch.)

I have one question though, would the 4096 dictionary size value work with your LZMA streams or do you require higher dictionary sizes? I would like to understand the real life situation.

bodgit · 2022-12-12T10:21:57Z

I have one question though, would the 4096 dictionary size value work with your LZMA streams or do you require higher dictionary sizes? I would like to understand the real life situation.

I'll try and find out what the value is actually set to. It's either going to be somewhere between 0 and 4096 or maybe its just 0. The archives in question are apparently quite old so it might just be an old/buggy 7-zip version that didn't encode the stream header correctly. I've got a test case made with a current version of 7-zip that uses LZMA which contains two streams, one for the archive metadata and another for the file contents and the dictionary is set to 4096 and 49152 respectively.

I'm not really familiar with the LZMA algorithm so I don't know what the behaviour is if the size is set too small. If the streams are setting a value that is non-zero, but less than 4096, then I suspect rounding up to 4096 would work fine. If it's unset, i.e. 0 because it wasn't encoded properly, then the library default of 8 MB seems to work fine but I'm not sure how to find an optimal value automatically.

ulikunitz · 2022-12-12T19:21:52Z

The LZ method supports copies of segments of the file preceding the current position of the file to be decompressed. The dictionary size describes the size of the sliding window from which the segments can be copied. If the window is larger than the dictionary size required, no issue will occur, if it is too small, the file will not be successful decoded. The window doesn't need to be larger than the stream to be decompressed. The archive metadata may be shorter than 4096 byte and so the data size might be used as window size, which would then be smaller than 4096 byte.

I suggest that you simply use the new implementation. If it's works without overriding "dictCap" than there is no issue, if you still need to override the value, your example stream is non-compliant with the LZMA specification.

@bodgit

As Matt Dainty (@bodgit) reported there is an issue if the header of the LZMA stream is less than the minimum dictionary size of 4096 byte. The specification of the LZMA format says that in that case a dictionary size of 4096 byte should be used, our code returns an error. This commit changes the behavior and adds a simple test case to test for the right behavior. Fixes [#52](#52)

ulikunitz · 2022-12-12T20:13:28Z

I released v0.5.11. The version should fix your issue.

ulikunitz · 2022-12-14T20:39:58Z

Hi, I would be interested to know whether the release works for you and whether it works without setting DictCap in the config parameter.

ulikunitz · 2022-12-18T07:23:17Z

Since it has been confirmed that 0.5.11 solves the issue, I close this pull request.

bodgit · 2022-12-18T23:08:13Z

It seems the streams with a dictionary capacity less than 4096 had the capacity set to 1024 or 2048, so ensuring it's at least 4096 seems to work fine.

Thanks for the bug fix!

Remove the check on dictionary capacity

9897685

The `ReaderConfig.DictCap` value is already range checked and guaranteed to be at least `MinDictCap` so the archive will always have a dictionary capacity of `MinDictCap` if a smaller value was in the header.

bodgit mentioned this pull request Dec 11, 2022

Return list of volume archive files. bodgit/sevenzip#54

Closed

ulikunitz self-assigned this Dec 11, 2022

ulikunitz closed this Dec 18, 2022

bodgit deleted the dictcap branch December 18, 2022 23:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove the check on dictionary capacity #52

Remove the check on dictionary capacity #52

bodgit commented Dec 11, 2022

ulikunitz commented Dec 11, 2022

bodgit commented Dec 12, 2022

ulikunitz commented Dec 12, 2022

ulikunitz commented Dec 12, 2022

ulikunitz commented Dec 14, 2022

ulikunitz commented Dec 18, 2022

bodgit commented Dec 18, 2022

Remove the check on dictionary capacity #52

Remove the check on dictionary capacity #52

Conversation

bodgit commented Dec 11, 2022

ulikunitz commented Dec 11, 2022

bodgit commented Dec 12, 2022

ulikunitz commented Dec 12, 2022

ulikunitz commented Dec 12, 2022

ulikunitz commented Dec 14, 2022

ulikunitz commented Dec 18, 2022

bodgit commented Dec 18, 2022