Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added zstd as a compression algorithm #506

Merged
merged 2 commits into from
Sep 16, 2022
Merged

Conversation

V-R-Dighe
Copy link
Contributor

Related #504

@schollz
Copy link
Owner

schollz commented Sep 13, 2022

Thanks @V-R-Dighe! This is great work building off @stefins comment.

I tested this patch against the current flate algorithm. This doesn't look like a clear improvement?

Here are some tests run from src/compress.

Current version:

> git checkout 1517767129b7f32e6e1c8d1caceace4fb8fda0be # current
> go test -bench=.
Level 9: 55% percent space savings
Level -2: 43% percent space savings
Level -2: 43% percent space savings
random, Level -2: -0% percent space savings
random, Level 9: -0% percent space savings
goos: linux
goarch: arm64
pkg: github.com/schollz/croc/v9/src/compress
BenchmarkCompressLevelMinusTwo-4                    1822            658837 ns/op
BenchmarkCompressLevelNine-4                        1024           1180830 ns/op
BenchmarkCompressLevelMinusTwoBinary-4                66          16841274 ns/op
BenchmarkCompressLevelNineBinary-4                    12          99928833 ns/op

zstd version:

> git checkout V-R-Dighe-develop
> go test -bench=.
Level 9: 53% percent space savings
Level -2: 23% percent space savings
Level -2: 23% percent space savings
random, Level -2: -0% percent space savings
random, Level 9: -0% percent space savings
goos: linux
goarch: arm64
pkg: github.com/schollz/croc/v9/src/compress
BenchmarkCompressLevelMinusTwo-4                   25466             47648 ns/op
BenchmarkCompressLevelNine-4                         214           4976949 ns/op
BenchmarkCompressLevelMinusTwoBinary-4                75          18479281 ns/op
BenchmarkCompressLevelNineBinary-4                     9         119355260 ns/op

For context, by default the level is -2 (HuffmanOnly).

It seems zstd for the Huffman-only encoding is faster, but provides much less compression. The higher level (i.e. 9) compression provides similar amount of compression to compress/flate but very comparable speed.

Are there further optimizations that can improve zstd? If so, please include them in the PR in the tests so I it can be compared against compress/flate.

@stefins
Copy link
Contributor

stefins commented Sep 15, 2022

Hi,
I tried the benchmark with https://github.com/klauspost/compress/tree/master/zstd
Results
With github.com/DataDog/zstd

Level 9: 53% percent space savings
Level -2: 23% percent space savings
Level -2: 23% percent space savings
random, Level -2: -0% percent space savings
random, Level 9: -0% percent space savings
goos: linux
goarch: amd64
pkg: github.com/schollz/croc/v9/src/compress
BenchmarkCompressLevelMinusTwo-2         	   45229	     27757 ns/op
BenchmarkCompressLevelNine-2             	     697	   1552866 ns/op
BenchmarkCompressLevelMinusTwoBinary-2   	     408	   2936133 ns/op
BenchmarkCompressLevelNineBinary-2       	      24	  54015984 ns/op
PASS
ok  	github.com/schollz/croc/v9/src/compress	6.336s

With github.com/klauspost/compress/zstd

Level 9: 52% percent space savings
Level -2: 51% percent space savings
Level -2: 51% percent space savings
random, Level -2: -0% percent space savings
random, Level 9: -0% percent space savings
goos: linux
goarch: amd64
pkg: github.com/schollz/croc/v9/src/compress
BenchmarkCompressLevelMinusTwo-2         	    1976	    682288 ns/op
BenchmarkCompressLevelNine-2             	     151	   8247417 ns/op
BenchmarkCompressLevelMinusTwoBinary-2   	     486	   2719676 ns/op
BenchmarkCompressLevelNineBinary-2       	     154	   8582093 ns/op
PASS
ok  	github.com/schollz/croc/v9/src/compress	8.907s

With current flate

Level 9: 55% percent space savings
Level -2: 43% percent space savings
Level -2: 43% percent space savings
random, Level -2: -0% percent space savings
random, Level 9: -0% percent space savings
goos: linux
goarch: amd64
pkg: github.com/schollz/croc/v9/src/compress
BenchmarkCompressLevelMinusTwo-2         	    3913	    300315 ns/op
BenchmarkCompressLevelNine-2             	    2180	    523972 ns/op
BenchmarkCompressLevelMinusTwoBinary-2   	     237	   4889606 ns/op
BenchmarkCompressLevelNineBinary-2       	      24	  43074478 ns/op
PASS
ok  	github.com/schollz/croc/v9/src/compress	5.103s

Seems like the bottleneck is with CGO calls in https://github.com/DataDog/zstd although I'm not sure

My patch

diff --git a/go.mod b/go.mod
index f68a0df..70acb14 100644
--- a/go.mod
+++ b/go.mod
@@ -18,8 +18,9 @@ require (
 	golang.org/x/time v0.0.0-20220609170525-579cf78fd858
 )

+require github.com/klauspost/compress v1.15.9
+
 require (
-	github.com/DataDog/zstd v1.5.2
 	github.com/OneOfOne/xxhash v1.2.5 // indirect
 	github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
 	github.com/davecgh/go-spew v1.1.1 // indirect
diff --git a/go.sum b/go.sum
index c04c635..ce9d791 100644
--- a/go.sum
+++ b/go.sum
@@ -1,6 +1,4 @@
 github.com/BurntSushi/toml v0.3.1/go.mod h1:xHWCNGjB5oqiDr8zfno3MHue2Ht5sIBksp03qcyfWMU=
-github.com/DataDog/zstd v1.5.2 h1:vUG4lAyuPCXO0TLbXvPv7EB7cNK1QV/luu55UHLrrn8=
-github.com/DataDog/zstd v1.5.2/go.mod h1:g4AWEaM3yOg3HYfnJ3YIawPnVdXJh9QME85blwSAmyw=
 github.com/OneOfOne/xxhash v1.2.2/go.mod h1:HSdplMjZKSmBqAxg5vPj2TmRDmfkzw+cTzAElWljhcU=
 github.com/OneOfOne/xxhash v1.2.5 h1:zl/OfRA6nftbBK9qTohYBJ5xvw6C/oNKizR7cZGl3cI=
 github.com/OneOfOne/xxhash v1.2.5/go.mod h1:eZbhyaAYD41SGSSsnmcpxVoRiQ/MPUTjUdIIOT9Um7Q=
@@ -17,6 +15,8 @@ github.com/denisbrodbeck/machineid v1.0.1/go.mod h1:dJUwb7PTidGDeYyUBmXZ2GphQBbj
 github.com/k0kubun/go-ansi v0.0.0-20180517002512-3bf9e2903213/go.mod h1:vNUNkEQ1e29fT/6vq2aBdFsgNPmy8qMdSay1npru+Sw=
 github.com/kalafut/imohash v1.0.2 h1:j/cUPa15YvXv7abJlM+kdJIycbBMpmO7WqhPl4YB76I=
 github.com/kalafut/imohash v1.0.2/go.mod h1:PjHBF0vpo1q7zMqiTn0qwSTQU2wDn5QIe8S8sFQuZS8=
+github.com/klauspost/compress v1.15.9 h1:wKRjX6JRtDdrE9qwa4b/Cip7ACOshUI4smpCQanqjSY=
+github.com/klauspost/compress v1.15.9/go.mod h1:PhcZ0MbTNciWF3rruxRgKxI5NkcHHrHUDtV4Yw2GlzU=
 github.com/kr/pretty v0.1.0 h1:L/CwN0zerZDmRFUapSPitk6f+Q3+0za1rQkzVuMiMFI=
 github.com/kr/pretty v0.1.0/go.mod h1:dAy3ld7l9f0ibDNOQOHHMYYIIbhfbHSm3C4ZsoJORNo=
 github.com/kr/pty v1.1.1/go.mod h1:pFQYn66WHrOpPYNljwOMqo10TkYh1fy3cYio2l3bCsQ=
diff --git a/src/compress/compress.go b/src/compress/compress.go
index 6964c78..68a8e53 100644
--- a/src/compress/compress.go
+++ b/src/compress/compress.go
@@ -4,7 +4,7 @@ import (
 	"bytes"
 	"io"

-	"github.com/DataDog/zstd"
+	"github.com/klauspost/compress/zstd"

 	log "github.com/schollz/logger"
 )
@@ -33,7 +33,10 @@ func Decompress(src []byte) []byte {

 // compress uses zstd to compress a byte slice to a corresponding level
 func compress(src []byte, dest io.Writer, level int) {
-	compressor := zstd.NewWriterLevel(dest, level)
+	compressor, err := zstd.NewWriter(dest, zstd.WithEncoderLevel(zstd.EncoderLevelFromZstd(level)))
+	if err != nil {
+		panic(err)
+	}
 	if _, err := compressor.Write(src); err != nil {
 		log.Debugf("error writing data: %v", err)
 	}
@@ -42,7 +45,10 @@ func compress(src []byte, dest io.Writer, level int) {

 // decompress uses zstd to decompress an io.Reader
 func decompress(src io.Reader, dest io.Writer) {
-	decompressor := zstd.NewReader(src)
+	decompressor, err := zstd.NewReader(src)
+	if err != nil {
+		panic(err)
+	}
 	if _, err := io.Copy(dest, decompressor); err != nil {
 		log.Debugf("error copying data: %v", err)
 	}

:)

go.mod Show resolved Hide resolved
src/compress/compress.go Outdated Show resolved Hide resolved
src/compress/compress.go Outdated Show resolved Hide resolved
src/compress/compress.go Outdated Show resolved Hide resolved
@V-R-Dighe
Copy link
Contributor Author

Hi @stefins,
Thanks for the suggestions.
I have reflected the changes. Please have a look.

cc @schollz.

@schollz
Copy link
Owner

schollz commented Sep 16, 2022

Really nice work @V-R-Dighe !! And thanks for clearing up that barrier @stefins !! This is a great improvement

@schollz schollz merged commit 8de4508 into schollz:main Sep 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants