-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix the string search behavior when using ICU #57078
Conversation
Tagging subscribers to this area: @tarekgh, @safern Issue DetailsIn .NET 5.0, we have switched to use the ICU library for Globalization support. We got some issues with the string search operations like
|
@safern could you please have a look? |
Haven't looked at the code yet, but this seems like an extremely risky change to make right before an RC snap. Are we sure we want this for 6.0? |
Why you think it is that risky? I am not seeing the same. Even I think this can meet the serving bar if we need to service 5.0. Note that this is affecting the breaking between the characters and not really the collation. That is why I am seeing the risk is not high here. |
0daa326
to
cd7fc5c
Compare
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
@GrabYourPitchforks just checking if you have any more comments? @safern could you please have a look when you have a chance? |
I don't have any other feedback. I still maintain that this change has the potential to introduce subtle breaks since we're introducing NLS-like behaviors across all platforms, even non-Windows platforms which always used ICU under the covers. And that normally we'd want such a change to bake for several previews before it goes in for real. I don't feel confident that we understand the potential fallout well enough to introduce this in RC, let alone in-place servicing for downlevel platforms. All of that said, if we still believe it's the right change for the product, I'll accept it. The code looks solid enough. :) |
@stephentoub @marklio what are your thoughts on the breakingness here. |
I'd like to get more perspectives before merging, if thats OK? |
@danmoseley Check your inbox. Tarek sent an excellent email that laid out the problem and why this solution was chosen. My response to that email boiled down to "let's make sure we understand these three things first, and if we're good with them then hit GO!" |
Marking while we cross i's and dot t's. |
@aolszowka you have reacted to a couple of comments here. do you have any feedback we need to consider with this change? or you are just watching the progress? |
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
src/libraries/Native/Unix/System.Globalization.Native/pal_collation.c
Outdated
Show resolved
Hide resolved
I am just watching this progress. I reported one of the original repros on the ICU behaviors here: #43802. I no longer work for that particular company with that code base that would have been greatly affected. I can say that these types of low level changes often result in extremely high profile breaks that doesn't generate a lot of good will at the C-Level, making the case for staying current on .NET difficult. From the community response in the various threads I don't think we're alone in that situation. I would echo @GrabYourPitchforks comments, and would be interested to see what persuaded a change to say 'ship it':
The feedback I have given to at least alert developers (at compile time) of these types of gotchas seems to have been back burner-ed #47775. In my opinion, in a perfect world these types of changes would roll out first as a compiler warning for a few months to encourage devs to study potential problem areas for remediation (or at least submit knowledge bases for areas to look out for) and then perform a runtime roll out. However historically these Analyzers have been an after thought, only after some hard-won in the wild knowledge has cause incidents. |
…runtime into FixStringSearchBehavior
src/libraries/Native/Unix/System.Globalization.Native/pal_timeZoneInfo.c
Outdated
Show resolved
Hide resolved
The failure in the CI leg |
This is pure ICU magic! |
Fixes #43736
In .NET 5.0, we have switched to use the ICU library for Globalization support. We got some issues with the string search operations like
IndexOf, LastIndexOf, IsSuffix, and IsPrefix
APIs. The linked issues containing some details about such problems. One example, when searching for "\n" in the string "\r\n" get string not found as a result.The underlying cause of this behavior is ICU internally is using a break iterator which disallows breaking between some characters.
The provided solution here is we are customizing the break iterator to get the desired behavior.