Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocksdb: add compression type zstd and lz4 #217

Merged
merged 12 commits into from
Dec 13, 2018
Merged

Conversation

neverchanje
Copy link
Contributor

@neverchanje neverchanje commented Nov 19, 2018

In brief, if have lots of free CPU, ZSTD is a good chooice; otherwise, LZ4 is recommended.

Compressor name Ratio Compression(MB/s) Decompress(MB/s)
zstd 1.3.4 -1 2.877 470 1380
zlib 1.2.11 -1 2.743 110 400
brotli 1.0.2 -0 2.701 410 430
quicklz 1.5.0 -1 2.238 550 710
lzo1x 2.09 -1 2.108 650 830
lz4 1.8.1 2.101 750 3700
snappy 1.1.4 2.091 530 1800
lzf 3.6 -1 2.077 400 860

1

RocksDB gives some suggestion here:

Use options.compression to specify the compression to use. By default it is Snappy. We believe LZ4 is almost always better than Snappy. We leave Snappy as default to avoid unexpected compatibility problems to previous users. LZ4/Snappy is lightweight compression so it usually strikes a good balance between space and CPU usage.

If you want to further reduce the in-memory and have some free CPU to use, you can try to set a heavy-weight compression in the latter by setting options.bottommost_compression. The bottommost level will be compressed using this compression style. Usually the bottommost level contains majority of the data, so users get almost optimal space setting, without paying CPU for compress all the data ever flowing to any level. We recommend ZSTD. If it is not available, Zlib is the second choice.

If you want have a lot of free CPU and want to reduce not just space but write amplification too, try to set options.compression to heavy weight compression type. We recommend ZSTD. Use Zlib if it is not available.

as we can see from the above table, lz4 is a compression algorithm with more moderate ratio and speed than zstd, so we also take it into consideration.

In this PR we disabled rocksdb's link with bzip2 (because we never use this lib for compression before), and add links with zstd and lz4. This two libs will be packed up using ./run.sh pack_server and ./run.sh pack_tools.

@qinzuoyan
Copy link
Contributor

lz4 is a compression algorithm with more moderate ratio?

@acelyc111
Copy link
Member

https://github.com/facebook/rocksdb/wiki/Compression
”LZ4/Snappy is lightweight compression so it usually strikes a good balance between space and CPU usage.“
ZSTD压缩率最高,但也更耗CPU。

@qinzuoyan
Copy link
Contributor

看起来lz4无论在压缩率还是速度上都优于snappy,为何感觉snappy更受欢迎呢?

@neverchanje
Copy link
Contributor Author

neverchanje commented Nov 19, 2018

Use options.compression to specify the compression to use. By default it is Snappy. We believe LZ4 is almost always better than Snappy. We leave Snappy as default to avoid unexpected compatibility problems to previous users. LZ4/Snappy is lightweight compression so it usually strikes a good balance between space and CPU usage.

rocksdb 默认用 snappy 是为了数据兼容性。snappy 跟 zlib 比起来确实更好,但是 lz4/zstd 比 snappy 更年轻自然也更优。
lz4 官方 benchmark 结果(lzbench):

Compressor Ratio Compression Decompression
memcpy 1.000 13100 MB/s 13100 MB/s
LZ4 default (v1.8.2) 2.101 730 MB/s 3900 MB/s
LZO 2.09 2.108 630 MB/s 800 MB/s
QuickLZ 1.5.0 2.238 530 MB/s 720 MB/s
Snappy 1.1.4 2.091 525 MB/s 1750 MB/s
Zstandard 1.3.4 -1 2.877 470 MB/s 1380 MB/s
LZF v3.6 2.073 380 MB/s 840 MB/s
zlib deflate 1.2.11 -1 2.730 100 MB/s 380 MB/s
LZ4 HC -9 (v1.8.2) 2.721 40 MB/s 3920 MB/s
zlib deflate 1.2.11 -6 3.099 34 MB/s 410 MB/s

跟 zstd 的测试结果还是一致的,zstd 的 benchmark 用的也是 lzbench

@qinzuoyan
Copy link
Contributor

看来以后我们线上默认用lz4更合适

if (compression_str == "none") {
_db_opts.compression = rocksdb::kNoCompression;
} else if (compression_str == "snappy") {
_db_opts.compression = rocksdb::kSnappyCompression;
} else if (compression_str == "zstd") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这里仅仅增加这几行代码还不够吧,还要在依赖库的链接上把这两种压缩也加进来

@acelyc111
Copy link
Member

acelyc111 commented Nov 21, 2018

看来以后我们线上默认用lz4更合适

之前看了下线上机器,很多都没装lz4/zstd库。要上的话需要提前都装好

@qinzuoyan
Copy link
Contributor

或者在pack_server的时候,把lz4/zstd库都带上

- sudo apt-get -y install libsnappy-dev
- sudo apt-get -y install libgflags-dev
- wget https://github.com/facebook/zstd/archive/v1.3.7.zip; unzip v1.3.7; cd zstd-1.3.7;
- mkdir cmake-build; cd cmake-build; cmake -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=lib -DZSTD_BUILD_PROGRAMS=OFF ../build/cmake; sudo make install -j8;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

为什么多个shell语句有放在一行?分开放更合理吧?

@qinzuoyan qinzuoyan merged commit d0b6530 into apache:master Dec 13, 2018
neverchanje pushed a commit to neverchanje/pegasus that referenced this pull request Jul 13, 2019
Former-commit-id: fd79b03bbec5acb93a7195b955c86176c9ba8236 [formerly d0b6530]
Former-commit-id: c0e1608ee7777a999d97eb9985c1750eb17b873a
acelyc111 pushed a commit that referenced this pull request Jun 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants