-
Notifications
You must be signed in to change notification settings - Fork 795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove allocations from jaro #6050
Conversation
awesome. here is another version that you might try:
|
@forki your suggestion start with allocating 2 arrays |
@forki I tried your code but the loop |
I tried to port https://stackoverflow.com/a/19165108/145701 but probably messed it up ;-) But your's looks better anywaay |
Why the "Test request for parse and check doesn't check whole project" test failed? |
Ugh it's one of those flaky FCS tests |
@TIHan and I will benchmark this before pulling. At least on the surface it seems fine, but we'll have to measure it to be sure. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
(Edited to reflect a correct benchmark with proper baselining, setup, and I'll want to benchmark this with actual data from VisualFSharp, but here's a quick benchmark comparing my name with 200 other male first names (appended it itself 42 times). so comparing against a list of 8400 strings: https://github.com/cartermp/JaroWinklerBenchmarks BenchmarkDotNet=v0.11.3, OS=macOS Mojave 10.14.2 (18C54) [Darwin 18.2.0]
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.100
[Host] : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT DEBUG
DefaultJob : .NET Core 2.2.0 (CoreCLR 4.6.27110.04, CoreFX 4.6.27110.04), 64bit RyuJIT
Takeaways:
Keep in mind that these represent measurements of a single run of the algorithm over 8400 strings. In an IDE session, it would be run many more times as the user generates errors that trigger this code path. In my own testing with #6044, this is all of the time. |
I wonder if there is a way to remove the need for normalizing the strings with |
@cartermp: A micro-optimization would be to replace it with
Perhaps instead of normalization for an invariant culture, it would be "good enough" to use normalization similar to I can see numerous other potential improvements in the code, like loop unrolling and creating hash tables of the set of identifiers, which will largely remain unchanged (I've seen some examples online where it was suggested that hashing is much faster than It may also be interesting to see if the inner loop in jaro can be replaced with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good change, thanks @AviAvni
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Based on @cartermp 's testing, this seems to improve compute time and allocations overall; therefore, the change is very much worth it.
I removed the allocation of the ResizeArray
Also remove the string concat from FilterPredictions