Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-deterministic output observed #65

Closed
pstibrany opened this issue Jan 10, 2020 · 7 comments · Fixed by #72
Closed

Non-deterministic output observed #65

pstibrany opened this issue Jan 10, 2020 · 7 comments · Fixed by #72

Comments

@pstibrany
Copy link

pstibrany commented Jan 10, 2020

Hi,

we are observing a weird issue, where different servers running the same version of LZ4 package (github.com/pierrec/lz4 v2.3.0+incompatible) produce different compressed output from the same input, and I'm trying to understand what is going on.

In our codebase, we reuse *lz4.Writer instances, and use (*lz4.Writer).Reset call when starting a new output. (*lz4.Writer) receives single Write call with entire input all at once. Inputs are ~2 MB, we use 64kb buffers in LZ4, mostly to reduce memory usage during decompression.

When reading two files with lz4 debug enabled, I see these outputs:

LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:103 header block max size id=4 size=65536
LZ4: reader.go:132 header read: lz4.Header{BlockMaxSize: 65536 }
LZ4: reader.go:152 header read OK compressed buffer 65536 / 131072 uncompressed buffer 65536 : 65536 index=65536
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15501
LZ4: reader.go:238 compressed block size 15501
LZ4: reader.go:274 current frame checksum 30916118
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15400
LZ4: reader.go:238 compressed block size 15400
LZ4: reader.go:274 current frame checksum fd477d82
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15396
LZ4: reader.go:238 compressed block size 15396
LZ4: reader.go:274 current frame checksum 564efa4
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15388
LZ4: reader.go:238 compressed block size 15388
LZ4: reader.go:274 current frame checksum 20424901
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 13886
LZ4: reader.go:238 compressed block size 13886
LZ4: reader.go:274 current frame checksum bba29999
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 25551 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:185 frame checksum got=bba29999 / want=bba29999

Second:

LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:103 header block max size id=4 size=65536
LZ4: reader.go:132 header read: lz4.Header{BlockMaxSize: 65536 }
LZ4: reader.go:152 header read OK compressed buffer 65536 / 131072 uncompressed buffer 65536 : 65536 index=65536
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15501
LZ4: reader.go:238 compressed block size 15501
LZ4: reader.go:274 current frame checksum 30916118
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15400
LZ4: reader.go:238 compressed block size 15400
LZ4: reader.go:274 current frame checksum fd477d82
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15399
LZ4: reader.go:238 compressed block size 15399
LZ4: reader.go:274 current frame checksum 564efa4
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 15386
LZ4: reader.go:238 compressed block size 15386
LZ4: reader.go:274 current frame checksum 20424901
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:203 raw block size 13887
LZ4: reader.go:238 compressed block size 13887
LZ4: reader.go:274 current frame checksum bba29999
LZ4: reader.go:295 copied 32768 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:295 copied 25551 bytes to input
LZ4: reader.go:145 Read buf len=32768
LZ4: reader.go:164 reading block from writer
LZ4: reader.go:185 frame checksum got=bba29999 / want=bba29999

The only difference is that raw block sizes for last two blocks are 15388, 13886 and 15386, 13887 respectively, otherwise both files decompress back to the same data.

Is there anything that can make LZ4 writers to generate slightly different output? Reset call seems to reset everything except the hashtable – could that be an issue?

Thank you.

ps: So far I've been unable to reproduce the issue on my machine. :-(

@pstibrany
Copy link
Author

pstibrany commented Jan 10, 2020

I'm attaching two LZ4 streams generated as described above (but ealier debug output is not related to these files).

files.zip

Here is program I've used to get lz4debug output:

// +build lz4debug

package main

import (
        "flag"
        "io"
        "log"
        "os"

        "github.com/pierrec/lz4"
)

func main() {
        flag.Parse()

        for _, f := range flag.Args() {
                openLz4(f)
        }
}

func openLz4(filename string) {
        f, err := os.Open(filename)
        if err != nil {
                log.Println("failed to open file", filename, "due to error:", err)
                return
        }
        defer f.Close()

        r := lz4.NewReader(f)

        _, err = io.Copy(os.Stdout, r)
        if err != nil {
                log.Println("error while reading file", filename, ":", err)
        }
}

@pierrec
Copy link
Owner

pierrec commented Jan 10, 2020

Thanks. I will look into it.
Also, although I guess you would have mentioned but I ask anyway:
did you have a chance to disable reuse of writers and if so, do you observe any issue?

@pstibrany
Copy link
Author

pstibrany commented Jan 13, 2020

Thanks. I will look into it.
Also, although I guess you would have mentioned but I ask anyway:
did you have a chance to disable reuse of writers and if so, do you observe any issue?

Thanks. No, we only tried switching to Snappy, which helped. We haven't tried not reusing lz4 writers, but I'll see if we can try that.

@pierrec
Copy link
Owner

pierrec commented Feb 20, 2020

Any news on this?

@pstibrany
Copy link
Author

Any news on this?

Unfortunately not. I dropped the ball here and haven't yet tried to not reuse writers. Problem is that I cannot reproduce it locally, and trying it on our busy cluster, where we have observed this problem may blow up memory allocations too much. I still would like to try it at some point though.

@pierrec
Copy link
Owner

pierrec commented Feb 20, 2020

No problem, thx for getting back to me. I am too busy atm to look at this in details, but any news is welcome!

@cyriltovena
Copy link

Hey @pierrec, I'm working with Peter on Loki and today I tried the dig down more into the problem. It was indeed the hashtable not being reset causing this non deterministic output.

I have a repro and I was able to fix it by adding this in the reset function

	// reset hashtable to ensure deterministic output.
	for i := range z.hashtable {
		z.hashtable[i] = 0
	}

I'm sending you a PR, let me know what you think !

pierrec added a commit that referenced this issue Apr 29, 2020
let Compress* and UncompreBlock functions handle hash/chain tables pooling
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants