Golang string comparison and edit distance algorithms library featuring : Levenshtein, LCS, Hamming, Damerau levenshtein (OSA and Adjacent transpositions algorithms), Jaro-Winkler, Cosine, etc...
- Requirements
- Introduction
- Features
- Installation
- Benchmarks
- Documentation
- Examples
- Author
- Contributing
- License
- Go (v1.13+)
Golang open-source library which includes most (and soon all) edit-distance and string comparision algorithms with some extra!
Designed to be fully compatible with Unicode characters!
This library is 100% test covered 😁
- Levenshtein
- LCS (Longest common subsequence) with edit distance, backtrack and diff functions
- Hamming
- Damerau-Levenshtein, with following variants:
- OSA (Optimal string alignment)
- Adjacent transpositions
- Jaro & Jaro-Winkler similarity algorithms
- Cosine Similarity
- Jaccard Index
- QGram
- Sorensen-Dice
- Computed similarity percentage functions based on all available edit distance algorithms in this lib
- Fuzzy search functions based on edit distance with unique or multiples strings output
- Unicode compatibility 🥳
You can check an interactive Google chart with few benchmark cases for all similarity algorithms in this library through StringsSimilarity function here
However, if you want or need more details, you can also viewing benchmark raw output here, which also includes memory allocations and test cases output (similarity result and errors).
If you are on Linux and want to run them on your setup, you can run ./tests/benchmark.sh
script.
Open bash into your project folder and run:
go get github.com/hbollon/go-edlib
And import it into your project:
import (
"github.com/hbollon/go-edlib"
)
If you are on Linux and want to run all unit tests just run ./tests/tests.sh
script.
For Windows users you can run:
go test ./... # Add desired parameters to this command if you want
You can find all the documentation here : Documentation
You can use StringSimilarity(str1, str2, algorithm)
function.
algorithm parameter must one of the following constants:
// Algorithm identifiers
const (
Levenshtein Algorithm = iota
DamerauLevenshtein
OSADamerauLevenshtein
Lcs
Hamming
Jaro
JaroWinkler
Cosine
)
Example with levenshtein:
res, err := edlib.StringsSimilarity("string1", "string2", edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Similarity: %f", res)
}
You can use FuzzySearch(str, strList, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearch("testnig", strList, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result: %s", res)
}
Result: testing
You can use FuzzySearchThreshold(str, strList, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchThreshold("testnig", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig': %s", res)
}
res, err = edlib.FuzzySearchThreshold("hello", strList, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'hello': %s", res)
}
Result for 'testnig': testing
Result for 'hello':
You can use FuzzySearchSet(str, strList, resultQuantity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSet("testnig", strList, 3, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Results: %s", strings.Join(res, ", "))
}
Results: testing, test, tester
You can use FuzzySearchSetThreshold(str, strList, resultQuantity, minSimilarity, algorithm)
function.
strList := []string{"test", "tester", "tests", "testers", "testing", "tsting", "sting"}
res, err := edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.5, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.5' threshold: %s", strings.Join(res, " "))
}
res, err = edlib.FuzzySearchSetThreshold("testnig", strList, 3, 0.7, edlib.Levenshtein)
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("Result for 'testnig' with '0.7' threshold: %s", strings.Join(res, " "))
}
Result for 'testnig' with '0.5' threshold: testing test tester
Result for 'testnig' with '0.7' threshold: testing
You can use one of the following function to get an edit distance between two strings :
- LevenshteinDistance(str1, str2)
- DamerauLevenshteinDistance(str1, str2)
- OSADamerauLevenshteinDistance(str1, str2)
- LCSEditDistance(str1, str2)
- HammingDistance(str1, str2)
Example with Levenshtein distance:
res := edlib.LevenshteinDistance("kitten", "sitting")
fmt.Printf("Result: %d", res)
Result: 3
You can use LCS(str1, str2)
function.
lcs := edlib.LCS("ABCD", "ACBAD")
fmt.Printf("Length of their LCS: %d", lcs)
Length of their LCS: 3
You can use LCSBacktrack(str1, str2)
function.
res, err := edlib.LCSBacktrack("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", res)
}
LCS: ABD
You can use LCSBacktrackAll(str1, str2)
function.
res, err := edlib.LCSBacktrackAll("ABCD", "ACBAD")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: %s", strings.Join(res, ", "))
}
LCS: ABD, ACD
You can use LCSDiff(str1, str2)
function.
res, err := edlib.LCSDiff("computer", "houseboat")
if err != nil {
fmt.Println(err)
} else {
fmt.Printf("LCS: \n%s\n%s", res[0], res[1])
}
LCS Diff:
h c o m p u s e b o a t e r
+ - - - + + + + + - -
👤 Hugo Bollon
- Github: @hbollon
- LinkedIn: @Hugo Bollon
- Portfolio: hugobollon.me
Contributions, issues and feature requests are welcome!
Feel free to check issues page.
Give a ⭐️ if this project helped you!
Copyright © 2020 Hugo Bollon.
This project is MIT License licensed.