Strawberry Cheesecake

Update, January 2021: I've created data compression algorithm which which offers an improvement over traditional Huffman coding in most cases and achieves the same performance as is achieved in this project. Check it out here: https://github.com/jeremy-rifkin/Markov-Huffman-Coding

Strawberry cheesecake is a data compression utility specifically designed for compressing numeric data.

This project stemmed out of a need for a better solution for compressing digits of pi. I had about 400 billion digits of pi, which took gzip 12 hours to compress (at a fairly lame compression ratio).

This utility is very special case and for its special case, strawberry cheesecake is both faster and more effective than any other file compression utility (at least of the ones I benchmarked against).

Algorithm	Compression Ratio	Time
`bzip2 -1`	0.431	72m 58s
`bzip2 -9`	0.431	81m 0s
`gzip -1`	0.497	23m 21s
`gzip -9`	0.469	74m 59s
`xz -0`	0.486	82m 55s
`xz -9`	0.441	1475m 58s
`plzip -0`	0.470	15m 45s
`plzip -9`	0.441	256m 55s
`strawberrycheesecake`	0.425	13m 26s

Benchmark used 50 billion digits of pi on an i7 and an hdd.

Strawberry cheesecake uses huffman coding with a fixed huffman tree. The fixed huffman tree treats each digit as equiprobable, as the digits of pi are equiprobable. Using a fixed huffman tree allows the compression to be done in one go instead of reading imput once to generate the tree then reading a second time to actually encode the data. Lookups tables are used to allow the coding and decoding processes to be done very efficiently. Strawberry cheesecake archives include a CRC64 in their file headers to help verify data integrity.

Usage

# Compile:
gcc main.c -O3 -funroll-loops -o strawberrycheesecake
# Compress:
./strawberrycheesecake input.txt output.sc
# Extract:
./strawberrycheesecake -x output.sc data.txt

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
.gitignore		.gitignore
README.md		README.md
build.sh		build.sh
main.c		main.c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Strawberry Cheesecake

Usage

About

Releases

Packages

Languages

jeremy-rifkin/strawberry-cheesecake

Folders and files

Latest commit

History

Repository files navigation

Strawberry Cheesecake

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages