-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add matchLength-returning APIs to CompareInfo #40752
Add matchLength-returning APIs to CompareInfo #40752
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
@@ -492,7 +492,7 @@ int32_t GlobalizationNative_IndexOf( | |||
result = GlobalizationNative_CompareString(pSortHandle, lpTarget, cwTargetLength, lpSource, cwSourceLength, options); | |||
if (result == UCOL_EQUAL && pMatchedLength != NULL) | |||
{ | |||
*pMatchedLength = cwTargetLength; | |||
*pMatchedLength = cwSourceLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was incorrectly returning the target length instead of the source length and was causing some newly introduced unit tests to fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch @GrabYourPitchforks ! it's not a common case code path:
// It's possible somebody passed us (source = <empty>, target = <non-empty>). |
but should we consider backporting this fix to older 3.1/2.1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't know if it requires backporting, to be honest. Here's a simplified repro:
var newString = "".Replace("\u200d", "x", StringComparison.InvariantCulture);
Before this fix, this throws an ArgumentOutOfRangeException
(due to attempting to slice a span in an invalid fashion). After the fix, this correctly returns ""
.
To trigger the bug, the "this" string must be empty, and the "old" string must be non-empty and consist only of zero-weight chars, and the caller must be passing a linguistic comparison such as InvariantCulture
. I think the likelihood of all three happening at the same time are low enough where we couldn't justify the backport.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree we don't have to service this till we get a case need it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks great to me, however, it would be great if you could add more edge case test cases and check if we can replace the call to ucol_getOffset
with math on our side.
@@ -492,7 +492,7 @@ int32_t GlobalizationNative_IndexOf( | |||
result = GlobalizationNative_CompareString(pSortHandle, lpTarget, cwTargetLength, lpSource, cwSourceLength, options); | |||
if (result == UCOL_EQUAL && pMatchedLength != NULL) | |||
{ | |||
*pMatchedLength = cwTargetLength; | |||
*pMatchedLength = cwSourceLength; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great catch @GrabYourPitchforks ! it's not a common case code path:
// It's possible somebody passed us (source = <empty>, target = <non-empty>). |
but should we consider backporting this fix to older 3.1/2.1?
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
src/libraries/System.Globalization/tests/CompareInfo/CompareInfoTests.IsPrefix.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Show resolved
Hide resolved
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Outdated
Show resolved
Hide resolved
CI failure is known issue #40416. |
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/Globalization/CompareInfo.cs
Outdated
Show resolved
Hide resolved
PR feedback was:
|
Thank you both for the excellent feedback! :D |
Fixes #27935.
The ICU logic is a little weird since we need to call
ucol_next
anducol_previous
to discover the attributes of the current collation element, but those methods have the side effect of mutating the offset. So I end up callingucol_getOffset
defensively before the iterator is advanced, which allows us to make a local copy. This is guarded behind a "is the caller actually asking for this?" check so that we don't take this hit in the common case where the caller doesn't care. (/cc @adamsitnik and @safern since I think you were spelunking in this code recently.)I also modified the
Utf8String.TryFind
andstring.Replace
methods to call the new public API instead of the internal API. This has the side effect of allowing us to remove the unsafe annotation fromstring.Replace
, since we're no longer working with raw pointers.There's a decent amount of churn in the unit test files because I annotated all of the test cases with "what should matchLength be when the method returns?" If your diff tool shows per-character differences rather than highlighting the entire line this will make a lot more sense.