Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TwoWaySearcher to work when used with periodic needles. #16612

Merged
merged 2 commits into from
Aug 23, 2014

Conversation

nham
Copy link
Contributor

@nham nham commented Aug 19, 2014

There is a check in TwoWaySearcher::new to determine whether the needle is periodic. This is needed because during searching when a match fails, we cannot advance the position by the entire length of the needle when it is periodic, but can only advance by the length of the period.

The reason "bananas".contains("nana") (and similar searches) were returning false was because the periodicity check was wrong.

Closes #16589

Also, thanks to @gankro, who came up with many buggy examples.

@@ -11,4 +11,24 @@
#[test]
fn strslice_issue_16589() {
assert!("bananas".contains("nana"));

let x = "012345678901234567890123456789bcdabcdabcd";
// prior to the fix for 16589, x.contains("abcdabcd") returned false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe a hash in front of that issue number for clarity?

@Gankra
Copy link
Contributor

Gankra commented Aug 19, 2014

Looks good, can't seem to break it anymore.

@nham nham force-pushed the twoway_searcher_fix branch from 85c08bc to d5eab2e Compare August 19, 2014 22:35

#[test]
fn test_strslice_contains() {
check_contains_all_substrings("There are moments, Jeeves, when one asks oneself, 'Do trousers matter?'");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is longer than 100 characters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops. Fixed.

There is a check in TwoWaySearcher::new to determine whether the needle
is periodic. This is needed because during searching when a match fails,
we cannot advance the position by the entire length of the needle when
it is periodic, but can only advance by the length of the period.

The reason "bananas".contains("nana") (and similar searches) were
returning false was because the periodicity check was wrong.

Closes rust-lang#16589
@nham nham force-pushed the twoway_searcher_fix branch from d5eab2e to 9419e92 Compare August 20, 2014 06:56
// Check if the needle is periodic. If so, during searching when we
// find a mismatch, we must only advance the position by the length
// of the period, not the length of the entire needle
if needle.slice_to(critPos) == needle.slice(period, period + critPos) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, I'm not 100% sure this handles not-exactly-period strings correctly, e.g. naXYZna, naxyzna, playpen
(I don't know this algorithm, so it might be doing the right thing, but it certainly seems weird that those two differ.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your link is mangled for me, so I'm not sure what you're referring to. Are you worried about maximal_suffix returning different factorizations for "naXYZna" and "naxyzna"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm worried about that. maximal_suffix is giving different periods for "naXYZna" and "naxyzna", which seems to contradict how my port of glibc's critical_factorization function works.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@huonw Nevermind, my port had a bug. I think what I have in the PR is correct. The maximal_suffix returns (i, p) where i is the starting position of the suffix found and p is the period of the suffix. So the factorization for "naxyzna" is ("naxy", "zna") and the factorization for "naXYZna" is ("na", "XYZna") (because the "maximal" in the function name is based on lexicographic ordering, not length). And actually, my comment in the code was wrong. It's actually checking whether the suffix is periodic. So that's why one is periodic and not the other. I will update the comments to make this clear.

@nham nham force-pushed the twoway_searcher_fix branch from 364f3b6 to 9a43492 Compare August 23, 2014 02:40
@nham
Copy link
Contributor Author

nham commented Aug 23, 2014

@alexcrichton r? I'll admit that originally I copied this logic from glibc's implementation without fully understanding it, but I've now taken a closer look at the Two-Way paper and understand what it does. It's essentially implementing the small-period function that can be found on p. 670 of the paper. I've updated the comments to explain what's going on here. I'm at least somewhat confident that this is correct.

@jruderman
Copy link
Contributor

I copied this logic from glibc's implementation

Does glibc have the same bug?

@Overv
Copy link

Overv commented Aug 23, 2014

Does glibc have the same bug?

No, he fixed the problem by copying the correct logic from the glibc equivalent, because he didn't fully understand the algorithm yet.

@nham
Copy link
Contributor Author

nham commented Aug 23, 2014

Yeah, I just meant that originally I noticed glibc does the equivalent of

needle.slice_to(critPos) == needle.slice(period, period + critPos)

instead of

needle.slice_to(critPos) == needle.slice_from(needle.len() - critPos)

and that it seemed to work when I made that change to the Rust code. So when I originally submitted the PR it was more a case of "this seems to work but I don't know why". Now I see that it actually directly matches what's in the paper, so I'm more confident that it's correct.

bors added a commit that referenced this pull request Aug 23, 2014
There is a check in TwoWaySearcher::new to determine whether the needle is periodic. This is needed because during searching when a match fails, we cannot advance the position by the entire length of the needle when it is periodic, but can only advance by the length of the period.

The reason "bananas".contains("nana") (and similar searches) were returning false was because the periodicity check was wrong.

Closes #16589

Also, thanks to @gankro, who came up with many buggy examples.
@bors bors closed this Aug 23, 2014
@bors bors merged commit 9a43492 into rust-lang:master Aug 23, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 25, 2024
internal: Fetch toolchain and datalayout for DetachedFiles

My changes broke the rustc test runs since we use detached files for that, this should fix that. Also addresses some new nightly warnings wrt to unnecessary imports
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"bananas".contains("nana") returns false
7 participants