Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[browser][non-icu] HybridGlobalization indexing #85254

Merged
merged 21 commits into from
May 18, 2023

Conversation

ilonatommy
Copy link
Member

@ilonatommy ilonatommy commented Apr 24, 2023

Implements a chunk of web-api based globalization. Is a part of HybridGlobalization feature and contributes to #79989.

Old, icu-based private API: GlobalizationNative_IndexOf, GlobalizationNative_LastIndexOf

New, non-icu private API: Interop.JsGlobalization.IndexOf

Affected public API (see: tests in CompareInfoTests.IndexOf.cs, CompareInfoTests.LastInfexOf.cs):

  • CompareInfo.IndexOf
  • CompareInfo.LastIndexOf
  • String.IndexOf
  • String.LastIndexOf
Test name time ICU4C [ms] time HG [ms] increase by [times]
String, String IndexOf 8.8016 2210.5625 250
String, String LastIndexOf 13.7985 2272.7500 165

All changes in behavior are listed in docs\design\features\hybrid-globalization.md.

@ghost
Copy link

ghost commented Apr 24, 2023

Tagging subscribers to this area: @dotnet/area-system-globalization
See info in area-owners.md if you want to be subscribed.

Issue Details

null

Author: ilonatommy
Assignees: ilonatommy
Labels:

area-System.Globalization

Milestone: -

@tarekgh tarekgh added the arch-wasm WebAssembly architecture label Apr 24, 2023
@ghost
Copy link

ghost commented Apr 24, 2023

Tagging subscribers to 'arch-wasm': @lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

1kB:
| String, String IndexOf | 1.6395ms |

4kB:
| String, String IndexOf | 20.8663ms |

16 kB:
| String, String IndexOf | 274.6250ms |

64kB:
| String, String IndexOf | 3185.0000ms |

Author: ilonatommy
Assignees: ilonatommy
Labels:

arch-wasm, area-System.Globalization

Milestone: -

@ilonatommy ilonatommy changed the title A bit faster version of indexing. WIP [browser][non-icu] HybridGlobalization indexing Apr 26, 2023
@runfoapp runfoapp bot mentioned this pull request May 3, 2023
@ilonatommy ilonatommy requested a review from mkhamoyan May 12, 2023 08:24
@pavelsavara
Copy link
Member

I guess one more merge from main would fix WBT


// The match may be affected by special character. Verify that the following character is regular ASCII.
if (sourceIndex < source.Length - 1 && *(a + sourceIndex + 1) >= 0x80)
goto InteropCall;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we wasting CPU cycles on average by doing the work again in JS ? Could we pass the current position and continue the search ?

Copy link
Member Author

@ilonatommy ilonatommy May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we wasting CPU cycles on average by doing the work again in JS ?

Yes, when the locale is en-* and the compariosn option is not IgnoreSymbols.

Could we pass the current position and continue the search ?

That makes sense. Because everyting up to the current position is equal, it pays off to start comparison in JS from that index, not from the beginning. But continuation of search is not beneficial - we will just compare all the remaining string in the InteropCall because we established that it's not a regular all-ASCII.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do it due to surogates. Example:
source: "o\u0308" = ö
needle: "o"
expected result: -1
If we do: check char by char till you get to some non-ASCII, save from what idx you had the match (from idx=0) and then send the rest of the strings (source: "\u0308", needle: "") to continue in JS - we get result = 0.
We need to keep the helper, it's a good optimisation in most cases. Only cases with "lotsOfAsciiAndThenAUnicodeAtTheEndThatWasNotCaughtByTheRegex" will suffer here, so not many. We could get rid of it totally and just send all strings to JS but then the "allAscii" cases would suffer.

src/mono/wasm/runtime/hybrid-globalization.ts Outdated Show resolved Hide resolved
src/mono/wasm/runtime/hybrid-globalization.ts Show resolved Hide resolved
src/mono/wasm/runtime/hybrid-globalization.ts Show resolved Hide resolved
src/mono/wasm/runtime/hybrid-globalization.ts Show resolved Hide resolved
@ilonatommy
Copy link
Member Author

Failure is not connected.

@ilonatommy ilonatommy merged commit 6022b3e into dotnet:main May 18, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Jun 17, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants