Column Compressor

Column Compressor is a library that supports column based compression with row based semantics. It provides support for reading and writing data in blocks of rows. Each of those blocks stores the data for each column sequentially. Each column may use a specific encoding option for that particular column. For example a column with large integer or long values that have a small variance from row to row may use variable length integer or long encoding.

Supported Column Types

bytes: Accepts byte arrays as input and output. Encoding/Decoding is left to calling class
Text255: Creates a dictionary of the top 255 words. Useful for columns with commonly repeated string values. All values outside of the top 255 are encoded as direct byte arrays.
RunLength: Encodes column values by counting the number of consecutive occurrences of each value. For example the number 10 repeated 100 times would be encoded as 10,100
DeltaInt: uses variable length integer encoding to record the delta between the current value and the previous value. Useful for columns with large integer values that do not have large deltas from one value to the next.
DeltaLong: uses variable length long encoding to record the delta between the current value and the previous value. Useful for columns with large long values that do not have large deltas from one value to the next.

Performance

No formal benchmarks are available at this time. However on data sets that resemble web logs we are seeing 10-15% size reduction in the output data sets as compared to a traditional row-based compression scheme.

Production Use?

AddThis is currently using this utility in a small number of production processes.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
core		core
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Column Compressor

Supported Column Types

Performance

Production Use?

About

Releases

Packages

Contributors 2

Languages

License

addthis/columncompressor

Folders and files

Latest commit

History

Repository files navigation

Column Compressor

Supported Column Types

Performance

Production Use?

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages