rhdiff is a simple implementation of input comparison tool based on rolling hash algorithm
The library uses the rsync algorithm to compare chunks of two files in rolling manner.
Given the delta between two input src
and dst
needs to be calculated, differ
will do the following:
- Split
src
into non-overlapping chunks of sizeS
- Calculate weak checksum (adler32) and strong checksum (sha256) for each chunk
-
CalculateDelta
function iterates over each overlapping chunk indst
using window of sizeS
-
Create a map from
src
with a weak checksum of each chunk as a key -
For each overlapping chunk consecutively check checksums and add moving blocks into the change list
- Calculate weak checksum of the current window
- If weak checksum is in the map, calculate strong checksum of the current window
- If strong and weak checksums match, the chunk in
dst
is the same as insrc
- Remove the key-value pair of fully matched from the map
- If offset of src chunk and offset of dst chunk match, the chunk wasn't moved
- If offset of src chunk and offset of dst chunk don't match, the chunk was moved
-
Write bytes between matching chunks into a buffer and dump the changes from the buffer as soon as a next matching block is found
-
Finally, iterate over remaining keys in the
src
map and collect deleted chunks
Basic comparison of two strings split into one byte chunks
package main
import (
"bytes"
"fmt"
"rhdiff/pkg/differ"
)
const chunkSizeBytes = 3
func main() {
src := bytes.NewReader([]byte("abcxyzfoo"))
dst := bytes.NewReader([]byte("abc12xyzfo"))
srcChunks := differ.Split(src, chunkSizeBytes)
changes := differ.CalculateDelta(srcChunks, dst, chunkSizeBytes)
for _, change := range changes {
fmt.Printf(
"operation: %s, srcOffset %d; dstOffset %d; data: %s\n",
change.Operation,
change.SrcOffset,
change.DstOffset,
string(change.Data),
)
}
}
Output
$ go run examples/basic.go
operation: EQUAL, srcOffset 0; dstOffset 0; data: abc
operation: MOVE, srcOffset 3; dstOffset 5; data: xyz
operation: ADD, srcOffset 0; dstOffset 3; data: 12
operation: ADD, srcOffset 0; dstOffset 8; data: fo
operation: DELETE, srcOffset 6; dstOffset 0; data: foo
$ go test ./... -cover