Skip to content

Latest commit

 

History

History
300 lines (252 loc) · 20.5 KB

AboutContentProperties_en.md

File metadata and controls

300 lines (252 loc) · 20.5 KB

Language: 日本語 | English

About Content properties1

1. What is content properties ?

1.1. Content properties overview

Content properties is the byte data required to correctly decode the encoded data in some codecs supported by 7-zip.

Application developers only need to know that the content properties is byte data of the length required for decoding, and do not need to understand what that content means. However, the application needs to manage the content properties correctly. Because the content properties is not included in the data encoded by the codec provided by 7-zip, and it is necessary data for decoding.

1.2. How is content properties used in 7-zip ?

Specifically, the content properties is the byte string used in the following interface in the 7-zip source code.

  • ICompressWriteCoderProperties
  • ICompressSetDecoderProperties2

The ICompressWriteCoderProperties interface contains a method that tells the encoder to write a content properties to the specified stream.

Also, the ICompressSetDecoderProperties2 interface contains a method to set the content properties to the decoder.

If your application uses codecs that implement above interfaces, the content properties must be properly managed by your application. In 7-zip version 21.07, the target codecs are as follows.

  • LZMA
  • LZMA2
  • PPMd7 (PPMd vesion H)
  • Rar1
  • Rar2
  • Rar3
  • Rar5

2. Why does an application have to manage content properties ?

Originally, at least for applications, it is desirable that data such as content properties be contained in the encoded data and hidden from the application. I'm guessing that the reason this isn't really the case is probably related to the fact that the same codec supports multiple file formats.

For example, here is an example for LZMA.

According to the document lzma.txt and its implementation source code LzmaAlone.cs included in the LZMA SDK, the LZMA file format is specified as follows:

Offset Length Description
0 5 bytes content properties
5 8 bytes Data length before encoding (little endian)
13 (Length of compressed data) Compressed data

On the other hand, according to the ZIP File Format Specification (APPNOTE.TXT), the file format when the data file included in the ZIP file is compressed in the LZMA format is specified as follows. (Actually, the implementation of ZIP archive file with 7-zip also follows this specification)

Offset Length Description
0 1 byte Major version of LZMA SDK
Major version of 7-zip in implementation with 7-zip
1 1 byte Minor version of LZMA SDK
Minor version of 7-zip in implementation with 7-zip
2 2 bytes Length of content properties (little endian)
Always 0x0005
4 5 bytes content properties
9 (Length of compressed data) Compressed data

In 7-zip, the "compressed data" part of these file formats is encoded / decoded using the same codec, which is a common format. However, as you can see, the header parts of these file formats are incompatible.

I guessed that the 7-zip developer separated the header part and the compressed data body part and left the processing of the header part to the application in order to support the multiple file formats mentioned above with one codec.2

However, regardless of what happened, as a result, LZMA (and LZMA2, PPMd7, etc.) stipulates that the reading and writing of the header part must be the responsibility of the application.

3. How should an application manage content properties ?

As mentioned earlier, at least for the LZMA (and LZMA2, PPMd7, etc.) codecs, the application must read and write the header portion of the encoded data. Of course, the format of the header part differs depending on the codec and file format, so different measures are required for each.

The following is sample code for a typical LZMA encoder / decoder application for your reference. You should be able to handle similar code for other codecs. However, you need to understand the header part of each file format and modify the code accordingly.

It is assumed that the ReadBytes function is defined as follows.

// Read data from inStream until buffer is filled. If inStream reaches the end in the middle of reading, an exception will occur.
private static void ReadBytes(Stream inStream, Span<Byte> buffer)
{
    while (buffer.Length > 0)
    {
        Int32 length = inStream.Read(buffer);
        if (length <= 0)
            throw new Exception("Unexpected end of stream");
        buffer = buffer.Slice(length);
    }
}

3.1. For the file format specified by the LZMA SDK

3.1.1. For decoding

using SevenZip.Compression.Lzma;
using System;
using System.IO;

...

Stream inStream = ... ; // Set the input stream
Stream outStream = ... ; // Set the output stream
Byte[] headerData = new Byte[LzmaDecoder.CONTENT_PROPERTY_SIZE + sizeof(UInt64)];
ReadBytes(inStream, headerData); // Read the header part
Span<Byte> contentProperty = new Span<Byte>(headerData, 0, LzmaDecoder.CONTENT_PROPERTY_SIZE); // Get the content properties part
UInt64 uncompressedDataLength = BitConverter.ToUInt64(headerData, LzmaDecoder.CONTENT_PROPERTY_SIZE); // Get the size of the data before compression
using (LzmaDecoder decoder = LzmaDecoder.CreateDecoder(new LzmaDecoderProperties { FinishMode = true }, contentProperty)) // Create a decoder with content properties
{
    decoder.Code(inStream, outStream, null, null, null); // Decode the body of the data
}

3.1.2. For encoding

using SevenZip.Compression.Lzma;
using System;
using System.IO;

...

Stream inStream = ... ; // Set the input stream
Stream outStream = ... ; // Set the output stream
using (LzmaEncoder encoder = LzmaEncoder.CreateEncoder(new LzmaEncoderProperties { Level = CompressionLevel.Normal })) // Create an encoder
{
    encoder.WriteCoderProperties(outStream); // Write content properties
    outStream.Write(BitConverter.GetBytes((UInt64)uncompressedDataLength)); // Write the length of the data before compression
    encoder.Code(inStream, outStream, null, null, null); // Encode the body of the data
}

3.2. For the file format specified by ZIP

3.2.1. For decoding

using SevenZip.Compression.Lzma;
using System;
using System.IO;

...

Stream inStream = ... ; // Set the input stream
Stream outStream = ... ; // Set the output stream
Byte[] headerData = new Byte[sizeof(Byte) + sizeof(Byte) + sizeof(UInt16) + LzmaDecoder.CONTENT_PROPERTY_SIZE];
ReadBytes(inStream, headerData); // Read the header part
Byte majorVersion = headerData[0]; // The major version is not used.
Byte minorVersion = headerData[1]; // The minor version is not used.
UInt16 contentPropertyLength = BitConverter.ToUInt16(headerData, 2); // Get the length of content properties
if (contentPropertyLength != LzmaDecoder.CONTENT_PROPERTY_SIZE) // Check the length of content properties
    throw new Exception("Illegal LZMA format");
Span<Byte> contentProperty = new Span<Byte>(headerData, 4, LzmaDecoder.CONTENT_PROPERTY_SIZE); // Get the content properties part.
using (LzmaDecoder decoder = LzmaDecoder.CreateDecoder(new LzmaDecoderProperties { FinishMode = true }, contentProperty)) // Create a decoder with content properties
{
    decoder.Code(inStream, outStream, null, null, null); // Decode the body of the data
}

3.2.2. For encoding

using SevenZip.Compression.Lzma;
using System;
using System.IO;

...

Byte majorVersion = ... ; // Set the major version of the LZMA SDK.
Byte minorVersion = ... ; // Set the minor version of the LZMA SDK.
Stream inStream = ... ; // Set the input stream
Stream outStream = ... ; // Set the output stream
using (LzmaEncoder encoder = LzmaEncoder.CreateEncoder(new LzmaEncoderProperties { Level = CompressionLevel.Normal })) // Create an encoder
{
    outSteram.WriteByte(majorVersion); // Write a major version of the LZMA SDK
    outSteram.WriteByte(minorVersion); // Write a minor version of the LZMA SDK
    outSteram.WriteByte((Byte)(LzmaDecoder.CONTENT_PROPERTY_SIZE >> 0)); // Write the low-order byte of the content properties length
    outSteram.WriteByte((Byte)(LzmaDecoder.CONTENT_PROPERTY_SIZE >> 8)); // Write the high-order byte of the content properties length
    encoder.WriteCoderProperties(outStream); // Write content properties
    encoder.Code(inStream, outStream, null, null, null); // Encode the body of the data
}

4. Cautions

  • This document is based on the 7-zip 21.07 and LZMA SDK 19.00 specifications and source code.
  • This document also describes things other than pure facts, such as my thoughts. And keep in mind that they may not always be the same as the 7-zip developer's view.

― I am careful to avoid erroneous descriptions and misleading expressions as much as possible. However, please note that I (and the 7-zip developers) are not responsible for any damage caused by referring to this document.

Footnotes

  1. The term content properties was coined by the author of this software. In the 7-zip file format specification and source code, it is simply described as "property". However, the term "property" can be confusing with similar terms used in the ICompressSetCoderProperties interface, etc., so I chose to call it content properties in this software.

  2. I don't know exactly, but as far as LZMA is concerned, I presume that the 7-zip developer had the following background.

    1. First, the file format in the LZMA SDK was specified.
    2. Next, support for LZMA compression on ZIP files was considered.
    3. The ZIP file originally holds the "length of data before compression" in the header. Therefore, it was decided that the 8-byte field of "data length before encoding" in the file format of the LZMA SDK is completely unnecessary.
    4. Unnecessary 8-byte fields have been removed in determining the LZMA compression format definition for ZIP files. Along with that, a field for the version of the LZMA SDK used by the compressed codec has been added to accommodate possible format changes in the future. (Actually, the LZMA SDK version field seems to be ignored)
    5. The 7-zip developer has considered making it possible to process both the LZMA SDK format and the ZIP format with the same codec. However, since those formats have different header parts, the read / write code in the header part is written at the application level separately from the LZMA codec.
    6. However, the header part of the file format contained content properties which is important for decoding. Therefore, 7-zip developers have provided the ICompressWriteCoderProperties interface and the ICompressSetDecoderProperties2 interface. The purpose is to allow the application to read and write headers while hiding the contents of the content properties from the application.