-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix TwoWaySearcher to work when used with periodic needles. #16612
Conversation
@@ -11,4 +11,24 @@ | |||
#[test] | |||
fn strslice_issue_16589() { | |||
assert!("bananas".contains("nana")); | |||
|
|||
let x = "012345678901234567890123456789bcdabcdabcd"; | |||
// prior to the fix for 16589, x.contains("abcdabcd") returned false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a hash in front of that issue number for clarity?
Looks good, can't seem to break it anymore. |
85c08bc
to
d5eab2e
Compare
|
||
#[test] | ||
fn test_strslice_contains() { | ||
check_contains_all_substrings("There are moments, Jeeves, when one asks oneself, 'Do trousers matter?'"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is longer than 100 characters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops. Fixed.
There is a check in TwoWaySearcher::new to determine whether the needle is periodic. This is needed because during searching when a match fails, we cannot advance the position by the entire length of the needle when it is periodic, but can only advance by the length of the period. The reason "bananas".contains("nana") (and similar searches) were returning false was because the periodicity check was wrong. Closes rust-lang#16589
d5eab2e
to
9419e92
Compare
// Check if the needle is periodic. If so, during searching when we | ||
// find a mismatch, we must only advance the position by the length | ||
// of the period, not the length of the entire needle | ||
if needle.slice_to(critPos) == needle.slice(period, period + critPos) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, I'm not 100% sure this handles not-exactly-period strings correctly, e.g. naXYZna
, naxyzna
, playpen
(I don't know this algorithm, so it might be doing the right thing, but it certainly seems weird that those two differ.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your link is mangled for me, so I'm not sure what you're referring to. Are you worried about maximal_suffix
returning different factorizations for "naXYZna" and "naxyzna"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm worried about that. maximal_suffix
is giving different periods for "naXYZna" and "naxyzna", which seems to contradict how my port of glibc's critical_factorization
function works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@huonw Nevermind, my port had a bug. I think what I have in the PR is correct. The maximal_suffix
returns (i, p)
where i
is the starting position of the suffix found and p
is the period of the suffix. So the factorization for "naxyzna" is ("naxy", "zna") and the factorization for "naXYZna" is ("na", "XYZna") (because the "maximal" in the function name is based on lexicographic ordering, not length). And actually, my comment in the code was wrong. It's actually checking whether the suffix is periodic. So that's why one is periodic and not the other. I will update the comments to make this clear.
364f3b6
to
9a43492
Compare
@alexcrichton r? I'll admit that originally I copied this logic from glibc's implementation without fully understanding it, but I've now taken a closer look at the Two-Way paper and understand what it does. It's essentially implementing the |
Does glibc have the same bug? |
No, he fixed the problem by copying the correct logic from the glibc equivalent, because he didn't fully understand the algorithm yet. |
Yeah, I just meant that originally I noticed glibc does the equivalent of
instead of
and that it seemed to work when I made that change to the Rust code. So when I originally submitted the PR it was more a case of "this seems to work but I don't know why". Now I see that it actually directly matches what's in the paper, so I'm more confident that it's correct. |
There is a check in TwoWaySearcher::new to determine whether the needle is periodic. This is needed because during searching when a match fails, we cannot advance the position by the entire length of the needle when it is periodic, but can only advance by the length of the period. The reason "bananas".contains("nana") (and similar searches) were returning false was because the periodicity check was wrong. Closes #16589 Also, thanks to @gankro, who came up with many buggy examples.
internal: Fetch toolchain and datalayout for DetachedFiles My changes broke the rustc test runs since we use detached files for that, this should fix that. Also addresses some new nightly warnings wrt to unnecessary imports
There is a check in TwoWaySearcher::new to determine whether the needle is periodic. This is needed because during searching when a match fails, we cannot advance the position by the entire length of the needle when it is periodic, but can only advance by the length of the period.
The reason "bananas".contains("nana") (and similar searches) were returning false was because the periodicity check was wrong.
Closes #16589
Also, thanks to @gankro, who came up with many buggy examples.