Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add matchLength-returning APIs to CompareInfo #40752

Merged
merged 3 commits into from
Aug 14, 2020

Conversation

GrabYourPitchforks
Copy link
Member

Fixes #27935.

The ICU logic is a little weird since we need to call ucol_next and ucol_previous to discover the attributes of the current collation element, but those methods have the side effect of mutating the offset. So I end up calling ucol_getOffset defensively before the iterator is advanced, which allows us to make a local copy. This is guarded behind a "is the caller actually asking for this?" check so that we don't take this hit in the common case where the caller doesn't care. (/cc @adamsitnik and @safern since I think you were spelunking in this code recently.)

I also modified the Utf8String.TryFind and string.Replace methods to call the new public API instead of the internal API. This has the side effect of allowing us to remove the unsafe annotation from string.Replace, since we're no longer working with raw pointers.

There's a decent amount of churn in the unit test files because I annotated all of the test cases with "what should matchLength be when the method returns?" If your diff tool shows per-character differences rather than highlighting the entire line this will make a lot more sense.

@ghost
Copy link

ghost commented Aug 13, 2020

Tagging subscribers to this area: @tarekgh, @safern, @krwq
See info in area-owners.md if you want to be subscribed.

@Dotnet-GitSync-Bot
Copy link
Collaborator

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@@ -492,7 +492,7 @@ int32_t GlobalizationNative_IndexOf(
result = GlobalizationNative_CompareString(pSortHandle, lpTarget, cwTargetLength, lpSource, cwSourceLength, options);
if (result == UCOL_EQUAL && pMatchedLength != NULL)
{
*pMatchedLength = cwTargetLength;
*pMatchedLength = cwSourceLength;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was incorrectly returning the target length instead of the source length and was causing some newly introduced unit tests to fail.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch @GrabYourPitchforks ! it's not a common case code path:

// It's possible somebody passed us (source = <empty>, target = <non-empty>).

but should we consider backporting this fix to older 3.1/2.1?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't know if it requires backporting, to be honest. Here's a simplified repro:

var newString = "".Replace("\u200d", "x", StringComparison.InvariantCulture);

Before this fix, this throws an ArgumentOutOfRangeException (due to attempting to slice a span in an invalid fashion). After the fix, this correctly returns "".

To trigger the bug, the "this" string must be empty, and the "old" string must be non-empty and consist only of zero-weight chars, and the caller must be passing a linguistic comparison such as InvariantCulture. I think the likelihood of all three happening at the same time are low enough where we couldn't justify the backport.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree we don't have to service this till we get a case need it.

Copy link
Member

@adamsitnik adamsitnik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks great to me, however, it would be great if you could add more edge case test cases and check if we can replace the call to ucol_getOffset with math on our side.

@@ -492,7 +492,7 @@ int32_t GlobalizationNative_IndexOf(
result = GlobalizationNative_CompareString(pSortHandle, lpTarget, cwTargetLength, lpSource, cwSourceLength, options);
if (result == UCOL_EQUAL && pMatchedLength != NULL)
{
*pMatchedLength = cwTargetLength;
*pMatchedLength = cwSourceLength;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

great catch @GrabYourPitchforks ! it's not a common case code path:

// It's possible somebody passed us (source = <empty>, target = <non-empty>).

but should we consider backporting this fix to older 3.1/2.1?

@GrabYourPitchforks
Copy link
Member Author

CI failure is known issue #40416.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Aug 13, 2020

PR feedback was:

  • Reduce parameter count taken in SimpleAffix native methods
  • Rename parameters in the p/invoke declaration to match existing conventions
  • Simplify newly-added IsPrefix / IsSuffix methods by delegating to existing overloads where possible
  • Additional unit test edge case coverage
  • Use named parameters instead of /* comments */

@GrabYourPitchforks
Copy link
Member Author

Thank you both for the excellent feedback! :D

@GrabYourPitchforks GrabYourPitchforks merged commit baaf591 into dotnet:master Aug 14, 2020
@GrabYourPitchforks GrabYourPitchforks deleted the str_matchlen branch August 14, 2020 00:35
@karelz karelz added this to the 5.0.0 milestone Aug 18, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 7, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose the match length when using String.IndexOf in culture specific mode
5 participants