Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize String.toFloatOrNull() #5364

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

romainguy
Copy link
Contributor

@romainguy romainguy commented Oct 1, 2024

The existing implementation used a regular expression which caused memory allocations, which are expensive on mobile devices. In addition, a custom parser can outperform regular expressions.

The new implementation is compatible with the original regular expression and performs ~22x faster on OpenJDK 22 with a 2021 MacBook Pro M1 Pro:

Benchmark                      Mode  Cnt    Score   Error   Units
KotlinBenchmark.customParser  thrpt       482.020          ops/ms
KotlinBenchmark.regex         thrpt        21.471          ops/ms

On a Pixel 6 running Android 14, the new implementation is ~225x faster:

    8,595,686   ns       10428 allocs    Trace    StringBenchmark.isFloatRegex
       37,755   ns           0 allocs    Trace    StringBenchmark.isFloat

It also has the benefit of never allocating anything (vs ~940 allocations per invocation for the existing implementation).

The existing implementation used a regular expression which caused memory
allocations, which are expensive on mobile devices. In addition, a custom
parser can outperform regular expressions.

The new implementation is compatible with the original regular expression
and performs ~22x faster on OpenJDK 22 with a 2021 MacBook Pro M1 Pro:

Benchmark                      Mode  Cnt    Score   Error   Units
KotlinBenchmark.customParser  thrpt       482.020          ops/ms
KotlinBenchmark.regex         thrpt        21.471          ops/ms

On a Pixel 6 running Android 14, the new implementation is ~225x faster:

    8,595,686   ns       10428 allocs    Trace    ColorBenchmark.isFloatRegex
       37,755   ns           0 allocs    Trace    ColorBenchmark.isFloat

It also has the benefit of never allocating anything (vs ~940 allocations
per invocation for the existing implementation).
romainguy and others added 2 commits October 1, 2024 19:16
Co-authored-by: Jake Wharton <github@jakewharton.com>
@romainguy
Copy link
Contributor Author

I just realized I could parse 4 digits at a time in the non-hexadecimal case. I'll give it a try to see if it helps performance.

@fzhinkin
Copy link
Contributor

fzhinkin commented Oct 2, 2024

@romainguy could you please also provide a link to benchmark sources?

@fzhinkin fzhinkin self-assigned this Oct 2, 2024
@romainguy
Copy link
Contributor Author

romainguy commented Oct 2, 2024

Here is the JVM benchmark: https://gist.github.com/romainguy/7acca58a1401ba9361bd93deb778e11e
The data is a mix of valid and invalid strings I used to test the implementation (on top of the existing tests in the Kotlin repository). I used the same implementation and data on Android.

And with the recent changes the custom parser is slightly faster as well:

Benchmark                      Mode  Cnt    Score    Error   Units
KotlinBenchmark.customParser  thrpt    4  500.113 ± 31.251  ops/ms
KotlinBenchmark.regex         thrpt    4   21.836 ±  5.416  ops/ms

@romainguy
Copy link
Contributor Author

romainguy commented Oct 2, 2024

Parsing four digits at a time (applied only to the fractional part where it's more likely to be useful) only helps if the data set contains a lot of strings where there are more than 4 digits after the period. It would help when parsing SVG files for instance, but it's not a clear win for the general case, so probably not worth adding considering the added implementation complexity. It does hurt the case when the fraction part has < 4 digits (on the dataset linked above, performance goes from 23x faster down to 20x faster on JVM).

For reference, here's a function that returns 4 when it can successfully parse 4 digits from the input string at a given offset, otherwise it returns 0 (so the call site can do start += parseFourDigits(…)):

private inline fun parseFourDigits(str: String, offset: Int): Int {
    val v = (str[offset + 0].code.toLong() or
            (str[offset + 1].code.toLong() shl 16) or
            (str[offset + 2].code.toLong() shl 32) or
            (str[offset + 3].code.toLong() shl 48))

    val base = v - 0x0030003000300030L
    val predicate = v + 0x0046004600460046L or base
    return if (predicate and 0xff80_ff80_ff80_ff80UL.toLong() == 0L) 4 else 0
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants