-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-deterministic output observed #65
Comments
I'm attaching two LZ4 streams generated as described above (but ealier debug output is not related to these files). Here is program I've used to get lz4debug output: // +build lz4debug
package main
import (
"flag"
"io"
"log"
"os"
"github.com/pierrec/lz4"
)
func main() {
flag.Parse()
for _, f := range flag.Args() {
openLz4(f)
}
}
func openLz4(filename string) {
f, err := os.Open(filename)
if err != nil {
log.Println("failed to open file", filename, "due to error:", err)
return
}
defer f.Close()
r := lz4.NewReader(f)
_, err = io.Copy(os.Stdout, r)
if err != nil {
log.Println("error while reading file", filename, ":", err)
}
} |
Thanks. I will look into it. |
Thanks. No, we only tried switching to Snappy, which helped. We haven't tried not reusing lz4 writers, but I'll see if we can try that. |
Any news on this? |
Unfortunately not. I dropped the ball here and haven't yet tried to not reuse writers. Problem is that I cannot reproduce it locally, and trying it on our busy cluster, where we have observed this problem may blow up memory allocations too much. I still would like to try it at some point though. |
No problem, thx for getting back to me. I am too busy atm to look at this in details, but any news is welcome! |
Hey @pierrec, I'm working with Peter on Loki and today I tried the dig down more into the problem. It was indeed the hashtable not being reset causing this non deterministic output. I have a repro and I was able to fix it by adding this in the reset function
I'm sending you a PR, let me know what you think ! |
let Compress* and UncompreBlock functions handle hash/chain tables pooling
Hi,
we are observing a weird issue, where different servers running the same version of LZ4 package (github.com/pierrec/lz4 v2.3.0+incompatible) produce different compressed output from the same input, and I'm trying to understand what is going on.
In our codebase, we reuse
*lz4.Writer
instances, and use(*lz4.Writer).Reset
call when starting a new output. (*lz4.Writer) receives singleWrite
call with entire input all at once. Inputs are ~2 MB, we use 64kb buffers in LZ4, mostly to reduce memory usage during decompression.When reading two files with lz4 debug enabled, I see these outputs:
Second:
The only difference is that raw block sizes for last two blocks are 15388, 13886 and 15386, 13887 respectively, otherwise both files decompress back to the same data.
Is there anything that can make LZ4 writers to generate slightly different output?
Reset
call seems to reset everything except the hashtable – could that be an issue?Thank you.
ps: So far I've been unable to reproduce the issue on my machine. :-(
The text was updated successfully, but these errors were encountered: