-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wrong Did you mean
suggestion: for alignmentScorr
clap suggests alignmentStart
rather than alignmentScore
#4660
Comments
I looked into this as part of #4664. The problem is jaro_winkler is returning the same score for both strings (1.0) and so the last declared value is being preferred |
Interesting, thanks! I'll check whether that's a bug in the It looks like https://github.com/dguo/strsim-rs is no longer maintained, and potentially has other bugs. I'll verify that this is indeed a bug. So one either needs to fork and fix, or switch to a different crate. For other bugs in Jaro-Winkler: rapidfuzz/strsim-rs#49 It turns out that the implementation of Jaro-Winkler in strsim-rs is very unconventional to wrong, see rapidfuzz/strsim-rs#53, I dug in a bit and indeed, if the strings have common prefixes >=10 the similarity is 1, which isn't really what we want I think. One immediate workaround would be to use just the normal Jaro instead, from the same crate. The Winkler modification is useful for things like making sure In either case, the current use of the false/buggy |
Implementation of Jaro-Winkler similarity in the dguo/strsim-rs crate is wrong, causing strings with common prefix >=10 to all be considered perfect matches Using Jaro instead from the same crate fixes this issue Benefit of favoring long prefixes exists for matching common names But not for typo detection Hence use of Jaro instead of Jaro-Winkler is acceptable Confidence threshold adjusted so that `bar` is still suggested for `baz` since Jaro is strictly < Jaro-Winkler such an adjustment is expected. This is acceptable. While exact suggestions may change, the net change will be positive Suggestions are purely decorative and should thus not breaking change Fixes clap-rs#4660 Also see rapidfuzz/strsim-rs#53
Implementation of Jaro-Winkler similarity in the dguo/strsim-rs crate is wrong, causing strings with common prefix >=10 to all be considered perfect matches Using Jaro instead from the same crate fixes this issue Benefit of favoring long prefixes exists for matching common names But not for typo detection Hence use of Jaro instead of Jaro-Winkler is acceptable Confidence threshold adjusted so that `bar` is still suggested for `baz` since Jaro is strictly < Jaro-Winkler such an adjustment is expected. This is acceptable. While exact suggestions may change, the net change will be positive Suggestions are purely decorative and should thus not breaking change Fixes clap-rs#4660 Also see rapidfuzz/strsim-rs#53
Please complete the following tasks
Rust Version
rustc 1.66.0 (69f9c33d7 2022-12-12)
Clap Version
clap 3.1.18
Minimal reproducible code
Will try to extract a minimal reproducible example later
Bug discovered here: nextstrain/nextclade@8facbb4
Steps to reproduce the bug with the above code
cargo run --bin nextclade run -d sars-cov-2 test.fa -C alignmentScorr -t out.tsv
Actual Behaviour
Expected Behaviour
Same as above except for:
alignmentScore
is much much closer by any string metric toalignmentScorr
thanalignmentStart
-> clear bugAdditional Context
I looked at the code how the
did you mean
is implemented and noticed that the tests don't seem to run due to type:features
instead offeature
:clap/src/suggestions.rs
Line 86 in e8518cf
Debug Output
Click to unfold
The text was updated successfully, but these errors were encountered: