-
-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory leaks when uncompressing multi-volume archives #575
Comments
Checking this a bit more, and it seems to me that the issues might reside here: Line 588 in 6b253d1
Filling the dict in the loop inside the Note that my code is running fine with |
Duplicated with #579 |
#620 removes the API which returns the dictionary. It will solve the problem here. |
You are welcome if you implement an API which use
|
|
Describe the bug
It seems decompressing a multi-volume archive with a relatively large amount of files (321 "outer" volumes, 8000 compressed files, 400kb each, so 3 Gb in total) is producing some memory leaks.
The basic code which is failing is:
See complete function: uncompress.py.txt
The corresponding archive is a multi-volume archive of 8000 files, 400 kb per file, filled with random data, and splitted each 10 mb. No filters, specific headers, encryption or password have been set. The compression options have been set to the defaults.
A copy of the archive is available here: multi.zip. Please note that the first level needs to be uncompressed manually before the test. The actual archive to be tested is the folder with the 321 "7z" volumes.
Better, it is possible to reproduce this archive (modulo the random data) using the following code: compress.py.txt. Several tests indicates that the behavior is not related to the random content, only related to the size of the files.
If enough memory is available, the archive can be uncompressed without any issue. The process is still taking a lot of memory (ie, 3.3 gb of memory), which is not expected as each compressed file is quite small, and the uncompression script discards immediately any data on the fly.
If not enough memory is available, the uncompression script is crashing, with a
CRC error
(see log below) or aBad7zFile: invalid header data
. Actually, it seems the CRC error is only a consequence of the lack of memory, as the archive looks perfectly fine.7z-crc-error.log
We can see the archive is error-free:
Note that for the purpose of tests, it is possible to deliberately fill the memory using commands such as:
head -c 5G /dev/zero | tail
Related issue
These issues might be related to this one, but none of the existing tickets mention multivolume and OOM at the same time:
To Reproduce
ps up <pid>
in another terminal to see how memory is increasingExpected behavior
Even if the archive has a total size of 3 gb, it is not expected that uncompressing it file by file, where each file is 400 kb, fills the memory. Uncompressing a multi-volume archive should have a very low memory footprint, as it should be possible to directly write the bytes on the disk, whatever size of the archive, size of individual files, amount of volumes or amount of compressed files we have in the archive.
Environment (please complete the following information):
Test data(please attach in the report):
See provided archive or script to generate it above.
Additional context
The text was updated successfully, but these errors were encountered: