Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Recent test on the compressor comparison #92

Closed
xuchuanyin opened this issue Sep 14, 2018 · 5 comments
Closed

Recent test on the compressor comparison #92

xuchuanyin opened this issue Sep 14, 2018 · 5 comments

Comments

@xuchuanyin
Copy link

xuchuanyin commented Sep 14, 2018

Hi all, yesterday I ran a compressor comparison through the benchmark tool provided in our test code. The below table shows a part of the result, a more detailed log can be found at the end of this comment.

comparison ratio compress decompress
airlift-snappy/xerial-snappy 2.294 0.706
airlift-lz4/jpountz-lz4 0.86 1.07
airlift-lzo/hadoop-lzo 1.92 2.6
airlift-zstd/luben-zstd 0.998 1.08

Any comments are welcome.

The full table is here: compress_log.xlsx

The full runlog of the benchmark is here: compressor.log

@dain
Copy link
Member

dain commented Sep 14, 2018

Is there something specific you are looking for comments on?

@xuchuanyin
Copy link
Author

I'm wondering if the test result is OK since this project claimed that

it faster than other implementation by 10%~40%

But for snappy decompression, it is about 30% slower.

@dain
Copy link
Member

dain commented Sep 15, 2018

After a quick look, I'd guess that either there is a regression/bug in aircompressor, the native snappy decompressor got a lot better, or the native snappy decompressor is running without bounds checks (very unsafe). BTW, we switched all of our uses from Snappy to LZ4, because was better in all of our use cases.

@xuchuanyin
Copy link
Author

xuchuanyin commented Sep 15, 2018

em... also the test shown that the airlift version of LZ4 is not obviously better than the jni version.

@dain
Copy link
Member

dain commented Sep 18, 2018

Ah, think I understand now. I believe you are asking the question "why would I use this project when I can use the JNI wrappers?" We created this project to avoid JNI for usability, portability issues and the effect is has on the the GC. In Hotspot based JVMs, you have two choices for JNI operating on heap data, you can copy it to native memory, or you can lock the heap and operate on it without copy. The fist mode has a computational cost, but in my mind the bigger problem is the complexity cost in the buffer/resource management. The second mode has a nasty problem that the locks prevent the GC from running regularly, which can result in early OOMs if you are running highly concurrent software like Presto with a highly concurrent GC like G1. Another benefit of Java for compression algorithms is that the JVM can inline the compression code directly into the uses, which can result in pretty big speedups as the compression code is adapted to the actual inlined use case. You can't see this thin isolated benchmarks, and instead you have to test your actual uses. Finally, there are other benefits like debugging and profiling just working like normal Java code.

The original goal of this project was to create compatible compression algorithms that were on par with the performance of the JNI wrappers. What we found in the initial versions where that they were actually more performant, typically in the range listed in the readme. The performance changes over time as the JNI implementations change, and as the JVM changes. For example, you ran the benchmark on Java 8, but we run on Java 10. We also likely run on different CPUs, and most importantly, the performance of compression algorithms is totally dependent on the data you feed it (take a look at the high variance of the different benchmark corpuses).

If none of this appeals to you, or you don't have the same concurrency issues or portability/ergonomics concerns, use the JNI implementations. They are generally excellent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants