Skip to content

Strawberry cheesecake is a data compression utility specifically designed for compressing numeric data.

Notifications You must be signed in to change notification settings

jeremy-rifkin/strawberry-cheesecake

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Strawberry Cheesecake

Update, January 2021: I've created data compression algorithm which which offers an improvement over traditional Huffman coding in most cases and achieves the same performance as is achieved in this project. Check it out here: https://github.com/jeremy-rifkin/Markov-Huffman-Coding


Strawberry cheesecake is a data compression utility specifically designed for compressing numeric data.

This project stemmed out of a need for a better solution for compressing digits of pi. I had about 400 billion digits of pi, which took gzip 12 hours to compress (at a fairly lame compression ratio).

This utility is very special case and for its special case, strawberry cheesecake is both faster and more effective than any other file compression utility (at least of the ones I benchmarked against).

Algorithm Compression Ratio Time
bzip2 -1 0.431 72m 58s
bzip2 -9 0.431 81m 0s
gzip -1 0.497 23m 21s
gzip -9 0.469 74m 59s
xz -0 0.486 82m 55s
xz -9 0.441 1475m 58s
plzip -0 0.470 15m 45s
plzip -9 0.441 256m 55s
strawberrycheesecake 0.425 13m 26s

Benchmark used 50 billion digits of pi on an i7 and an hdd.

Strawberry cheesecake uses huffman coding with a fixed huffman tree. The fixed huffman tree treats each digit as equiprobable, as the digits of pi are equiprobable. Using a fixed huffman tree allows the compression to be done in one go instead of reading imput once to generate the tree then reading a second time to actually encode the data. Lookups tables are used to allow the coding and decoding processes to be done very efficiently. Strawberry cheesecake archives include a CRC64 in their file headers to help verify data integrity.

Usage

# Compile:
gcc main.c -O3 -funroll-loops -o strawberrycheesecake
# Compress:
./strawberrycheesecake input.txt output.sc
# Extract:
./strawberrycheesecake -x output.sc data.txt

About

Strawberry cheesecake is a data compression utility specifically designed for compressing numeric data.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published