Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CompressionMethod.STORE requires ZipParameters EntrySize to be set in advance #10

Closed
fkirc opened this issue Jul 2, 2019 · 7 comments
Closed
Assignees
Labels
bug Something isn't working

Comments

@fkirc
Copy link

fkirc commented Jul 2, 2019

ZipOutputStream.java contains the following piece of code, which I do not understand:

private void verifyZipParameters(ZipParameters zipParameters) {
    if (zipParameters.getCompressionMethod() == CompressionMethod.STORE
        && zipParameters.getEntrySize() < 0
        && !isEntryDirectory(zipParameters.getFileNameInZip())) {
      throw new IllegalArgumentException("uncompressed size should be set for zip entries of compression type store");
    }
  }

This IllegalArgumentException is thrown when I call zipFile.addStream(myInputStream, myParameters);

Note that the default entry size is set to -1 (hence the above exception is thrown).
I can avoid this exception by calling myParameters.setEntrySize(?) with a nonnegative value.

My question is: Why do I need to set the entry size ahead of time? Why only for CompressionMethod.STORE? This was not necessary for 1.X versions of Zip4j.
Is this a bug?
If it is not a bug: Please document this change.
It seems strange to set the entry size even before adding the input stream to the zip file.

I would expect that the entry size is determined automatically once the InputStream has reached its end.

@srikanth-lingala
Copy link
Owner

This was a tough decision to make, but there was no other option left other than throwing an exception when entry size is not set in case os STORE compression. I again have to get a bit detailed into the zip file format. Please make sure you read my post from the other issue before you read any further.

This check had to be done to support streams. Zip4j 2.x is built around streams. Zip4j 1.x was not. So, I had the luxury to skip this check in v1.x. But with the added support for streams in 2.x, there was no other option left than to add this check.

As described in the linked post, if a data descriptor is present in the header, the file sizes are set to 0 in local file header. Actual file sizes are in data descriptor and file headers. Note that file headers are at the end of the zip file. Now, imagine reading a stream of zip file. The first thing we read is the local file header. And this has the size of 0. Zip4j has no way of knowing the compressed size of the entry. And therefore, Zip4j does not know where and when to stop reading content for this particular entry, because the actual file sizes are at the end of the entry, and without knowing the end, we can never know where to get this information from. In case of Deflate compression, we can determine the end of compressed data even without knowing the size of the compressed data. Unfortunately, we do not have this luxury with STORE compression. And that is the reason I had to add this check, so that a zip file created by zipoutpustream can then be read by zipinputstream, and not just the ZipFile api.

In fact, even jdk's zip utility has this check. It is quite understandable that they have it as well, because there is no way around it, especially if you want to support streams.

Options you have are:

  • Use deflate when adding content to a zip file via streams if entry sizes are unknown upfront. In this case, you do not have to set entry size.
  • Determine the entry size upfront and set it in the zipparameters.

I will now close this issue, as it is not really a bug in zip4j, but a check that we cannot avoid for the reasons mentioned above.

@fkirc
Copy link
Author

fkirc commented Jul 5, 2019

Perhaps it would be a good idea to implement a special case for the ZipFile API:

We could seek back to the local fileheader within the close method of ZipOutputStream, and then just set the correct sizes within the local fileheader.

@fkirc
Copy link
Author

fkirc commented Jul 5, 2019

Beside making the API easier to use, this would also have to added advantage of avoiding a DATA_DESCRIPTOR in case of the ZipFile API in conjunction with ZipOutputStream.

@srikanth-lingala
Copy link
Owner

With ZipFile API (except for addStream) in v2.x, file sizes are always added in the local file header and data descriptor is always avoided. Data descriptor is only written when using ZipOutputStream directly or when using ZipFile.addStream(). Because in both these cases, it is impossible to traverse back to the location of file header.

Zip4j 1.3.3 used to add data descriptors even with ZipFile api, which is not the case anymore with v2.x

@fkirc
Copy link
Author

fkirc commented Jul 5, 2019

Data descriptor is only written when using ZipOutputStream directly or when using ZipFile.addStream(). Because in both these cases, it is impossible to traverse back to the location of file header.

I doubt that this is impossible in the second case.
If you call myZipFile.addStream(), then this looks like the streams API, but you are actually still operating on a conventional Zip file that is seekable.

@srikanth-lingala
Copy link
Owner

Yes, you are right. I mixed things up. I correct myself: My comment above applies only for ZipOutputStream and not for ZipFile.addStream(). I will fix it. Reopening issue.

@srikanth-lingala
Copy link
Owner

Fixed in v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants