Skip to content

Latest commit

 

History

History
145 lines (117 loc) · 5.98 KB

README.md

File metadata and controls

145 lines (117 loc) · 5.98 KB

go-lzop

A wrapper for LZO compression that works with go-lzo libraries to create lzop-compatible files.

Installation

To get:

go get github.com/MediaMath/go-lzop

Usage

You can use the library with either your own compiled LZO lib from source, or use https://github.com/rasky/go-lzo.

In this example I'm using the rasky lzoCompress1X. It's important you only pass our lzop function some iteration of 1x1 compression. If you want to work out the values for 1x999 or some other in-between, you will have to fork the repo, and use different constants for the binary values written to the header packer. Writing a level 9 compression while header packing level 1 compression obviously isn't a great idea.

import "github.com/MediaMath/go-lzop"

fileCreationTime := time.Now().Unix()
filename := "file.txt"
someString := "Hello I am a string"
data, err := CompressData(fileCreationTime, filename, []byte(someString), lzo.Compress1X)

data in this example should be able to be directly written to disk, and decompressed via

lzop -d -f file.txt.lzo

Hexdump

On a xbG7k1TvFZ.txt which contains "data"

hexdump -C xbG7k1TvFZ.txt
00000000  64 61 74 61                                       |data|
00000004
lzop -1 -f xbG7k1TvFZ.txt
hexdump -C xbG7k1TvFZ.txt.lzo
00000000  89 4c 5a 4f 00 0d 0a 1a  0a 10 30 20 80 09 40 02  |.LZO......0 ..@.|
00000010  01 03 00 00 01 00 00 81  b4 57 9a 4a 84 00 00 00  |.........W.J....|
00000020  00 0e 78 62 47 37 6b 31  54 76 46 5a 2e 74 78 74  |..xbG7k1TvFZ.txt|
00000030  92 81 09 1f 00 00 00 04  00 00 00 04 04 00 01 9b  |................|
00000040  64 61 74 61 00 00 00 00                           |data....|
00000048

Hexdump result of the same file after running through go-lzop and compressed using the LZO 1x1 libs

hexdump -C xbG7k1TvFZ.txt.lzo
00000000  89 4c 5a 4f 00 0d 0a 1a  0a 10 30 20 80 09 40 02  |.LZO......0 ..@.|
00000010  01 03 00 00 01 00 00 81  b4 57 9a 4a 84 00 00 00  |.........W.J....|
00000020  00 0e 78 62 47 37 6b 31  54 76 46 5a 2e 74 78 74  |..xbG7k1TvFZ.txt|
00000030  92 81 09 1f 00 00 00 04  00 00 00 04 04 00 01 9b  |................|
00000040  64 61 74 61 00 00 00 00                           |data....|
00000048

All the way back

lzop -d -f xbG7k1TvFZ.txt.lzo
hexdump -C xbG7k1TvFZ.txt
00000000  64 61 74 61                                       |data|
00000004

Warning

Data that's created with this packer seems to slightly differ as files grow in size. Files compressed to 229392890 bytes (219MB) with lzop might be 229392000 bytes with this lib. LZOP is adding or moving bytes on large files that I haven't found, but it doesn't seem to impact the overall outcome of the data. If you lzop with built LZO source, and LZO compress with this lib using the same built source, the .lzo produced might differ slightly in size, but contains the same data during decompression. Diff will return no result on the files.

Go Test

Tests assume you have lzop installed; if not:

sudo apt-get install lzop

Then

go get github.com/dchest/uniuri
go get github.com/rasky/go-lzo
cd lzop
go test

Background information on LZOP header packing which go-lzop follows

//packingOrder Is the exact order and type length of everything needed to make a LZOP looking header.
//This struct defines the layout of the data and the types they represent, but isn't used directly
//since each value is constant or generated during compression
//All of these values must be written with some form of
//err = binary.Write(buff, binary.BigEndian (system dependant), <TYPE>(VALUE))
//After writing the complete header you will have to write, in a block size manner, the rest of the compressed data
//Then to mark the end of the file. LZOP was looking for \x00 4 times.
//So make sure you do a f.Write([]byte("\x00\x00\x00\x00"))
type packingOrder struct {
	LZOPMagic               []byte //The LZOP magic byte header []byte{0x89, 0x4c, 0x5a, 0x4f, 0x00, 0x0d, 0x0a, 0x1a, 0x0a}
	Version                 uint16 // Using 0x1030
	LibVersion              uint16 // Using 0x2080
	VersionNeededForExtract uint16 // Using 0x0940
	Method                  uint8  // using 2 because it gets us to -1 compression
	Level                   uint8  // 1 (-1 from command line)
	Flags                   uint32 //Flags is some magic number I dunno. It looks like they are using it as a way to choose between adler32 and CRC checksum. 0x3000001 works for us
	FileMode                uint32 //0x000081B4 or 33204 (as int) this makes it to -rw-rw-r--
	// Filter                  uint32 //Not using filter at all, they would be written here, but they aren't written
	TimeLow        uint32 //This is the UTC stamp of file creation (time.Now().Unix())
	TimeHigh       uint32 // This is 0 on big endian systems. It looks like it will give a different result on LittleEndian I think?
	FileNameLength uint8  //The length of the file name
	FileName       string //The file name you are packing

	//CheckSum Stuff
	//Because flags isn't using CRC, we default to adler32 checksum.
	//You need to write all values in this struct minus LZOPMagic and the checksum itself to a buffer.
	//This is done by calling adler32.Checksum() those []bytes and then write
	//Adler32CheckSum result to the buffer/file you were working on as uint32
	Adler32CheckSum uint32

	//After the adler32 checksum there is another checksum + data addition that occurs **EVERY** blocksize rotation

	/*As defined in LZOP
	  #if defined(ACC_OS_DOS16) && !defined(ACC_ARCH_I086PM)
	  #  define BLOCK_SIZE        (128*1024l)
	  #else
	  #  define BLOCK_SIZE        (256*1024l)
	  #endif
	  #define MAX_BLOCK_SIZE      (64*1024l*1024l)        
    */

	// You write the three values below (both sizes + checksum) every rotation

	//This is the uncompressed data length
	UncompressedDataLength uint32

	//Then the compressed data length
	CompressedDataLength uint32

	//This is a checksum where you have to write the uncompressed data to the adler32.Checksum() and it will yield this value
	Adler32UncompressedDataCheckSum uint32
}