Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZipException while extracting in version 2.1.x, but works fine in 1.x #39

Closed
JensSchliesser opened this issue Aug 11, 2019 · 27 comments
Closed
Assignees
Labels
new-feature New feature or request

Comments

@JensSchliesser
Copy link

JensSchliesser commented Aug 11, 2019

Hello,

I´m getting the following exception while extracting using 2.1.2. The same file works fine in 1.3.3. I also checked the file with unzip -t and with WinZIP and 7z -> no problems there.

 Caused by: net.lingala.zip4j.exception.ZipException: Reached end of entry, but crc verification failed for {--file--}
 	at net.lingala.zip4j.io.inputstream.ZipInputStream.verifyCrc(ZipInputStream.java:240) ~[zip4j-2.1.1.jar!/:na]
 	at net.lingala.zip4j.io.inputstream.ZipInputStream.endOfCompressedDataReached(ZipInputStream.java:154) ~[zip4j-2.1.1.jar!/:na]
 	at net.lingala.zip4j.io.inputstream.ZipInputStream.read(ZipInputStream.java:122) ~[zip4j-2.1.1.jar!/:na]
 	at java.base/sun.nio.cs.StreamDecoder.readBytes(Unknown Source) ~[na:na]
 	at java.base/sun.nio.cs.StreamDecoder.implRead(Unknown Source) ~[na:na]
 	at java.base/sun.nio.cs.StreamDecoder.read(Unknown Source) ~[na:na]
 	at java.base/java.io.InputStreamReader.read(Unknown Source) ~[na:na]
 	at java.base/java.io.Reader.read(Unknown Source) ~[na:na]
 	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:2001) ~[commons-io-2.4.jar!/:2.4]
 	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1980) ~[commons-io-2.4.jar!/:2.4]
 	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1957) ~[commons-io-2.4.jar!/:2.4]
 	at org.apache.commons.io.IOUtils.copy(IOUtils.java:1907) ~[commons-io-2.4.jar!/:2.4]
 	at org.apache.commons.io.IOUtils.toString(IOUtils.java:778) ~[commons-io-2.4.jar!/:2.4]
 	at de.xxx.xxx.helper.ExtractImport.content(ExtractImport.java:90) ~[classes!/:1.5.0-SNAPSHOT]
 	... 14 common frames omitted
final Map<String, String> data = new HashMap<>();
final Charset charset = Charset.forName("Cp1252");

tempFile = createTempFile(bin);
ZipFile zipFile = new ZipFile(tempFile, password.toCharArray());
ZipInputStream zis;

final Map<String,FileHeader> fileheaders = zipFile.getFileHeaders()
        .stream()
        .collect(toMap(FileHeader::getFileName, Function.identity()));
...
for(Map.Entry<String,IBaseModel> file: toParseFiles.entrySet()) {
    final String filename = file.getKey();
    final FileHeader fileHeader = fileheaders.get(filename);

    if(fileHeader == null) {
        ...
    } else {
        zis = zipFile.getInputStream(fileHeader);
        data.put(filename, IOUtils.toString(zis, charset));
        zis.close();
    }
}

and..Thanks for your great work!

@srikanth-lingala
Copy link
Owner

Can you please also attach the zip file? It will greatly help me in investigating the issue. Thanks

@JensSchliesser
Copy link
Author

sry I cannot give you the zip because it contains a lot private customer data. :(

Find attached some screenshots from the 1.3.3 fileheader, the 2.1.2 fileheader and the decompressed crc value of the 2.1.2 extraction (the crc in the 2.1.2 extraction matches the crc in 1.3.3 but not in 2.1.2), maybe this might help.

I don´t know if this is might be the problem: but in the 1.3.3 version you are reading the CRC value of the LocalFileHeader as little Endian int, but in the 2.1.x version as long little Endian. (Line 169 in HeaderReader)

133_fileheader

212_decompress

![212_fileheader](https://user-images.githubusercontent.com/12368111/62848461-186af300-bcdc-11e9-9c75-3d933a79668f.jpg)

@srikanth-lingala
Copy link
Owner

It will be difficult to investigate the issue without the zip file. So please bear with my questions.

Is this the only zip file for which you have this issue or are you able to reproduce this issue with any other zip file as well? If yes, can you please create a dummy zip (without sensitive data) and send it to me?
Which tool have to used to generate the zip file?

@azhao-2019
Copy link

i use the 'net.lingala.zip4j:zip4j:2.1.2' .
it zip and unzip is ok,
but when i want to get the "zipFile.getInputStream(fileHeader)", the "zipInputStream.available() = 0"

@azhao-2019
Copy link

`public static String zip(Context context, String password) {

	try {
		String cache = context.getFilesDir().getPath() + File.separator + "cache.zip";
		Log.e("cache", "cache=" + cache);
		
		InputStream assetInputStream = context.getAssets().open("demo.p12");
		
		ZipFile zipFile1 = new ZipFile(cache, password.toCharArray());
		
		ZipParameters zipParameters = new ZipParameters();
		int available = assetInputStream.available();
		//available  = 2543
		Log.e("cache ", " available = " + available);
		zipParameters.setEntrySize(available);
		
		zipParameters.setEncryptFiles(true);
		zipParameters.setEncryptionMethod(EncryptionMethod.AES);
		// Below line is optional. AES 256 is used by default. You can override it to use AES 128. AES 192 is supported only for extracting.
		zipParameters.setAesKeyStrength(AesKeyStrength.KEY_STRENGTH_256);
		
		zipParameters.setFileNameInZip("demo01.p12");
		zipParameters.setCompressionMethod(CompressionMethod.STORE);
		zipFile1.addStream(assetInputStream, zipParameters);
		// I can get the correct zip file.
		//I can open this zip file on my computer.
	} catch (Exception e) {
		e.printStackTrace();
	}
	return null;
}

public static InputStream getCerInputStream(Context context, String name, String pass) throws Exception {
	String cache = context.getFilesDir().getPath() + File.separator + "cache.zip";
	Log.e("cache ", cache);
	
	ZipFile zipFile = new ZipFile(cache, pass.toCharArray());
	List<FileHeader> fileHeaders = zipFile.getFileHeaders();
	int size = fileHeaders.size();
	//size = 1
	Log.e("cache ", "size = " + size);
	for (int i = 0; i < size; i++) {
		FileHeader fileHeader = fileHeaders.get(i);
		
		//name = demo01.p12
		Log.e("cache ", "name = " + fileHeader.getFileName());
		ZipInputStream zipInputStream = zipFile.getInputStream(fileHeader);
		
		//false
		Log.e("cache ", " zipInputStream = " + (zipInputStream == null));
		
		// TODO: size = 0 , There's a problem. zipInputStream shouldn't be zero.
		Log.e("cache ", " zipInputStream size= " + zipInputStream.available());
		
		//unzip test
		String cachePath = context.getFilesDir().getPath();
		zipFile.extractFile(fileHeader.getFileName(), cachePath);
		
		String cachePath33 = context.getFilesDir().getPath() + File.separator + fileHeader.getFileName();
		
		FileInputStream inputStream = new FileInputStream(cachePath33);
		//size = 2543
		Log.e("cache ", "inputStream size = " + inputStream.available());
		return zipFile.getInputStream(fileHeader);
	}
	
	return null;
}`

@srikanth-lingala
Copy link
Owner

@azhao-2019 It would have been good if you have opened a new issue for your case. Your issue has nothing to do with the issue reported in this issue.

available() is quite hard (if not impossible) to implement correctly on a ZipInputStream. And this is the reason even Java's ZipInputStream returns either a 0 or 1 for available(), which is quite understandable (0 when end of stream is reached, and 1 when not). Zip4j did not implement available method so far because of this inconsistency. But I will implement something like Java's 0 or 1 when available method is called. Will include this in next release.

For any further communication on this issue, please open a new issue. Thanks.

@srikanth-lingala
Copy link
Owner

@JensSchliesser

in the 1.3.3 version you are reading the CRC value of the LocalFileHeader as little Endian int, but in the 2.1.x version as long little Endian.

This is an intended fix. It was an issue in v1.x. If you look at your screenshot with 1.x, crc value is negative. This should not be the case. This was an issue because crc was defined as an int in 1.x which should be a long and the negative value was long rounding off to an int.

And regarding this issue:

I tried a lot to reproduce this issue. I tried with several zip files of varying sizes and content and used a code similar to yours (although a much cut down version). Below is the code I used:

    final Charset charset = Charset.forName("Cp1252");
    ZipFile zipFile = new ZipFile("/Users/slingala/Downloads/zip4j_src_1.3.2-patched.zip");

    for (FileHeader fileHeader : zipFile.getFileHeaders()) {
      ZipInputStream zipInputStream = zipFile.getInputStream(fileHeader);
      System.out.println(IOUtils.toString(zipInputStream, charset));
      zipInputStream.close();
    }

Each of the zip files I used successfully printed out the content successfully and I was not able to reproduce this issue. Unfortunately without a zip file it will be pretty hard to investigate what is going wrong. If you are able to generate a sample zip file with which you can reproduce this issue, and one that you can share with me, it would be great if you can send it to me. I am very interested to investigate and fix this issue, but without being able to reproduce it, unfortunately, I am in an unhelpful situation.

@JensSchliesser
Copy link
Author

JensSchliesser commented Aug 18, 2019

Hello,

thank you for all your efforts. I will really try to get such a ZIP from our dev System, which contains obfuscated private data.

This is an intended fix. It was an issue in v1.x. If you look at your screenshot with 1.x, crc value is negative. This should not be the case. This was an issue because crc was defined as an int in 1.x which should be a long and the negative value was long rounding off to an int.

Okay but the CRC Checksum returned from crc32.update in the decompression method in 2.1.2 is indeed negative. (see screenshot below) . Which should NOT be the case as you said, but its the same as in the 1.x Version.

212_decompress

@srikanth-lingala
Copy link
Owner

srikanth-lingala commented Aug 18, 2019

Okay but the CRC Checksum returned from crc32.update in the decompression method in 2.1.2 is indeed negative.

That is because, Java's CRC32 internally stores the crc as an int. But the value we should be looking at is Crc32.getValue() which returns a long. The negative value I was referring to was within zip4j code which should not be the case in v2.x. This crc value should be the same as the value stored in the zip file headers. I just printed out the value I see in your debug screen -270911946 as a long (-270911946 & 0xffffffffL) and I get the value 4024055350, which is the value stored in the header as I see from one of your earlier screenshots. So, you should not be getting this error.

Can you please debug and let me know both the crc's on line 250 in ZipInputStream?

@JensSchliesser
Copy link
Author

Can you send me a mail to jens.schliesser@gmail.com ? I can send you a link to a non working file

@srikanth-lingala
Copy link
Owner

Mail sent

@srikanth-lingala
Copy link
Owner

The issue here is not with the extraction of the zip file, but rather with the tool that created the zip file. To explain this issue, I first have to explain a bit about the zip file format. Below is a copy-paste of this explanation I made in relation to another issue here:

An entry (file) in a zip file has the following format:

[LOCAL_FILE_HEADER][....DATA....][DATA_DESCRIPTOR]

And the above format repeats for the number of files present in the zip file, and at the end of the file, there are some additional headers containing information of the overall zip file (number of entries, and a lot of other info). (Note: the actual entry structure can get a lot more complicated with differing scenarios, the above entry structure is a very simplified version, just enough to make my point here).

DATA_DESCRIPTOR in the above structure is an optional header block. This header block contains the crc, compressed size and uncompressed size of the entry. This header block can be used if these sizes are not known upfront. Imagine working with streams, in this case, you do not know the size of compressed data upfront. You only know that information after you finish writing the data. But since we are working with streams, we cannot go back to the point where localfileheaders were written to update that data (compressed and uncompressed sizes). It is in this scenario, that DATA_DESCRIPTOR can be used. In such cases, crc, compressed size and uncompressed size are marked as 0 in localfilheader and the actual data is available in either DATA_DESCRIPTOR or the fileheader. This is totally as per the zip specification. And all zip tools should handle that scenario.

The issue with your zip file is that the entry header flag in local file header says that there is a data descriptor present at the end of the entry, but there is no such entry defined at the end of the data. This flag in the local file header should not be set for these files. Can you please let me know which tool you have used to create the zip file? If it was zip4j, please mention the version and also the code you used to create the zip file.

@srikanth-lingala
Copy link
Owner

srikanth-lingala commented Aug 18, 2019

And also, the reason your code works with zip4j v1.x and other zip tools is that they do not rely on the local file header of the zip file, and rather rely more on the file header (which is a file meta data at the end of the zip file). Zip4j v2.x relies more on the local file header, which had to be done to make it support streams, and since the local file header data is corrupt, you get this exception with zip4j v2.x.

@JensSchliesser
Copy link
Author

JensSchliesser commented Aug 18, 2019

Thank you for explanations! The file was (is) created by an older .NET application. Despite your analysis it´s really hard to explain someone why this is an error of their application, if all tested zip tools (unzip, WinZIP, 7z, unarchiver etc) can extract the archive and are not reporting any checksum errors. Maybe relying that hard on that header entry and the missing descriptor and terminating with an exception is a little hard (in comparison with all the other zip tools which seems to ignore that).

Btw: the strange thing is: the other file I sent you (the working one) was created by the same .NET application

@srikanth-lingala
Copy link
Owner

srikanth-lingala commented Aug 18, 2019

All other tools have the luxury of not having to deal with streams. They only work with files. As long as the tools work with files, they can skip relying on local header entry and only rely on the header entries from the zip file metadata. Even zip4j v1.x, which lacked support for streams, relied on file metadata headers. But one of the most requested features for zip4j since v1.x was the support of streams. And with introduction of streams, I have to rely on local file header.

If they still don't understand after explaining this, then they do not follow the zip specification, and or do not understand the zip specification. This is a very simple mistake that they are doing, and nothing too complicated. When they set the data descriptor flag in headers, then they actually have to define a data descriptor in the zip file. If they do not intend to write data descriptor, then the flag should not be set in the file headers. They don't have to dig deep into zip specification to understand this. This is a very basic zip concept. Please refer them to section 4.3.9 of the zip specification, and section 4.3.6 describes the structure of a zip file.

That said, I will see what I can do to rely at least partially on file metadata. I personally am not a big fan for a fix for this in zip4j because this will be like introducing fixes in zip4j for bugs in other zip tools/libraries, which will bring additional complexity to zip4j and might be difficult to maintain over the long run. I cannot promise a "fix" for this, but I have to first analyse the complexity it brings to zip4j and see if it will be feasible. I think it is easiest (depends on who owns the .NET application) if they can fix it.

@srikanth-lingala
Copy link
Owner

Btw: the strange thing is: the other file I sent you (the working one) was created by the same .NET application

Probably because this bug occurs in that .NET application in some special cases only.

@srikanth-lingala
Copy link
Owner

srikanth-lingala commented Aug 21, 2019

After a lot of thoughts and playing around with the code, I have decided not to include any code change for this in zip4j. As mentioned earlier, this is not a bug in zip4j. This is a bug in the tool that created the zip file. A code change for this in zip4j would mean reading from zip metadata when available (reading from files and not from streams). But what if the zip metadata is not properly defined because of a bug in another tool or the same tool. And moreover even if I make necessary code changes, this will not be fixed when reading streams and thereby making it inconsistent. The couple of main reasons for me not to do any code changes for this are that doing a code change for this means patching zip4j for bugs in other tools and also that there is no guarantee that some other tool generates the zip metadata correctly. The right thing to do in this case, in my opinion, is to fix the tool that generated the zip file. Sorry that I can't be of more help, but I hope you understand my point.

@jochenr
Copy link

jochenr commented Sep 13, 2019

Hello,
we have exactly the same issue as JensSchliesser.
The company that sends us the passsword protected ZIP files feels confident that their ZIPs files are following the spec.
Thay say that they use the C1Zip library:
https://www.grapecity.com/controls/studio/zip

Using the command "zipinfo -v " I get this output:
`Archive: XXXXX.zip
There is no zipfile comment.

End-of-central-directory record:

Zip archive file size: 23995033 (00000000016E2299h)
Actual end-cent-dir record offset: 23995011 (00000000016E2283h)
Expected end-cent-dir record offset: 23995011 (00000000016E2283h)
(based on the length of the central directory and its expected offset)

This zipfile constitutes the sole disk of a single-part archive; its
central directory contains 2 entries.
The central directory is 138 (000000000000008Ah) bytes long,
and its (expected) offset in bytes from the beginning of the zipfile
is 23994873 (00000000016E21F9h).

Central directory entry #1:

XXXX1.txt

offset of local header from start of archive: 0
(0000000000000000h) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 2.0
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: encrypted
extended local header: yes
file last modified on (DOS date/time): 2019 Sep 4 15:01:56
32-bit CRC value (hex): a4ab40d8
compressed size: 241 bytes
uncompressed size: 871 bytes
length of filename: 16 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc

There is no file comment.

Central directory entry #2:

XXXX2.xml

offset of local header from start of archive: 287
(000000000000011Fh) bytes
file system or operating system of origin: MS-DOS, OS/2 or NT FAT
version of encoding software: 2.0
minimum file system compatibility required: MS-DOS, OS/2 or NT FAT
minimum software version required to extract: 2.0
compression method: deflated
compression sub-type (deflation): normal
file security status: encrypted
extended local header: yes
file last modified on (DOS date/time): 2019 Sep 4 15:01:56
32-bit CRC value (hex): ae99fc07
compressed size: 23994526 bytes
uncompressed size: 156580261 bytes
length of filename: 30 characters
length of extra field: 0 bytes
length of file comment: 0 characters
disk number on which file begins: disk 1
apparent file type: binary
non-MSDOS external file attributes: 000000 hex
MS-DOS file attributes (20 hex): arc

There is no file comment.
`

@srikanth-lingala
Copy link
Owner

@jochenr Would it be possible to send me a sample zip file with which you can reproduce this issue? Unfortunately the most important data I need is missing in the output above.

If in fact the issue here is the same as explained above (extended local header flag present, but header missing), then the issue is with the tool that created the zip file. But to pinpoint if this is the same root cause, having a reproducible zip file would be of great help. It would be hard to say the root cause based on the exception because there can be several reasons for it.

If you are unable to send a zip file, can you please set a breakpoint here and let me know the value of localFileHeader.isDataDescriptorExists() after localFileHeader has been populated. And also, please set a breakpoint here, and let me know the content of the intBuff. If the value of localFileHeader.isDataDescriptorExists() is true and the content of intBuff is [80, 75, 3, 4], then this is the same issue as originally reported here. If not, then unfortunately, I cannot say more and I will need the zip file to investigate the issue. When you are debugging, can you also please let me know the value of fileHeader.isDataDescriptorExists() in the first breakpoint.

@srikanth-lingala
Copy link
Owner

BTW, for the last entry in the zip file (in your case 2nd file as there are 2 files in your zip), the value of intBuff might not be [80, 75, 3, 4] but might be [80, 75, 1, 2]. So, if a zip file that you are testing only has 1 file in it, you might see [80, 75, 1, 2] and not [80, 75, 3, 4] for intBuff

@jochenr
Copy link

jochenr commented Sep 13, 2019

Thank you very much for the fast response.

Unfortunately I am not allowed to give you the original file. But my colleagues are asking for a test ZIP from the company that sends us the files.
As you proposed I debugged the failing extraction:
When the first breakpoint is reached it has a value of dataDescriptorExists=true for the localFileHeader and dataDescriptorExists=true for the fileHeader.
The second breakpoint shows this value: intBuff=[80, 75, 3, 4]

So it seems to be the same issue.

grafik

@srikanth-lingala
Copy link
Owner

I confirm that it is the same issue. I don't think I will be needing a sample zip file.

The zip file which JensSchliesser had also was generated by some .NET application/library. Maybe they both are using the same library and the root cause is the same. I can for sure say that the library has a bug in generating zip files. In short, the library in one of the headers says that certain data will be present in the zip file with a flag, but that block of data is not present. Zip4j tries to read it and fails when it cannot find that data. Unfortunately it is not possible for me to ignore it if it does not exist. I have to get into lengthy zip specifications to explain why, but in short, it will be impossible for me to skip if the data is not present. There is a brief discussion in the posts above, just enough to get the root cause of the problem in case anyone is interested to know more. Of course, I can also gladly explain the issue in more detail to anyone interested.

@srikanth-lingala
Copy link
Owner

On some more thoughts, I think I can do something in zip4j to overcome this. Although I will still say that the root cause of this issue is in the .NET application, but I think a small workaround can be done in zip4j without causing much harm to the valid zip files. I will update this thread again soon.

@srikanth-lingala
Copy link
Owner

@JensSchliesser @jochenr
Fix for this issue is now released (v2.1.4). Please note that this will work only when using ZipFile api. It will still fail for zip files with such missing header blocks when using ZipInputStream api. In this case, unfortunately, there is nothing I can do to work around this issue.

@jochenr
Copy link

jochenr commented Sep 14, 2019

thank you very very much:-)
I'll immediately give it a try Monday when I'm back in office.
Still I'll try to convince the guys who build the ZIPs to fix their library, because we want to switch to your streaming api.

Again, thank you & have a nice weekend

@jochenr
Copy link

jochenr commented Sep 16, 2019

I just verified that the work around (in version 2.1.4) for the ZipFile mode works with the corrupt files we get from our data provider.
thank you

@demonbug
Copy link

When i zip a whole folder with a sub-folder contains a single file with zip4j 1.3.1 and unzip it with zip4j 2.1.2, an exception is threw:

Exception in thread "main" net.lingala.zip4j.exception.ZipException: Could not read corresponding local file header for file header: folderB/folderC/fileD
	at net.lingala.zip4j.tasks.AbstractExtractFileTask.verifyNextEntry(AbstractExtractFileTask.java:85)
	at net.lingala.zip4j.tasks.AbstractExtractFileTask.extractFile(AbstractExtractFileTask.java:47)
	at net.lingala.zip4j.tasks.ExtractAllFilesTask.executeTask(ExtractAllFilesTask.java:36)
	at net.lingala.zip4j.tasks.ExtractAllFilesTask.executeTask(ExtractAllFilesTask.java:12)
	at net.lingala.zip4j.tasks.AsyncZipTask.performTaskWithErrorHandling(AsyncZipTask.java:42)
	at net.lingala.zip4j.tasks.AsyncZipTask.execute(AsyncZipTask.java:36)
	at net.lingala.zip4j.ZipFile.extractAll(ZipFile.java:431)
	at demo.DemoApplication.testUnzip(DemoApplication.java:21)
	at demo.DemoApplication.main(DemoApplication.java:33)

The zip file structure is like this:

a.zip
   |-- folderB
         |-- folderC
              |-- fileD

I read the changelog and found this issus, it's not only a bug of .NET app, 1.3.1 will create a EXTRA_DATA_RECORD structure, cause zip4j deal wrong on net.lingala.zip4j.headers.HeaderReader.readLocalFileHeader(InputStream), line 513

int sig = rawIO.readIntLittleEndian(inputStream);
if (sig != HeaderSignature.LOCAL_FILE_HEADER.getValue()) {
  return null;
}

If i create a zip with only one folder ( a.zip/folferB/fileC ), it's ok to unzip.
And thank you very much for fixing this issus! Hope this case will help you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
new-feature New feature or request
Projects
None yet
Development

No branches or pull requests

5 participants