perf(lexer): use SIMD char search when parsing numbers #3250

DonIsaac · 2024-05-12T20:22:48Z

What This PR Does

Use memchr_iter to find and remove underscores when parsing number literals.

I'm seeing some performance improvements when running benchmarks on my machine, but I want confirmation from the CI benchmark job.

graphite-app · 2024-05-12T20:22:52Z

Your org has enabled the Graphite merge queue for merging into main

Add the label “merge” to the PR and Graphite will automatically add it to the merge queue when it’s ready to merge. Or use the label “hotfix” to add to the merge queue as a hot fix.

You must have a Graphite account and log in to Graphite in order to use the merge queue. Sign up using this link.

codspeed-hq · 2024-05-12T20:27:58Z

CodSpeed Performance Report

Merging #3250 will not alter performance

_{Comparing don/perf/lexer-simd-numbers (487a8c7) with main (eefb66f)}

Summary

✅ 27 untouched benchmarks

DonIsaac · 2024-05-12T20:28:29Z

Looks like it doesn't help, so I'm closing this PR

Boshen · 2024-05-13T03:15:27Z

The benchmark probably doesn't have any relevant data, assigning @overlookmotel for a quick look.

overlookmotel · 2024-05-13T15:03:26Z

I can see 3 reasons why this appears to be slightly hurting performance rather than improving it:

Probably a majority of numbers in typical JS source code are short integers e.g. in for (let i = 0; i < 5; i++). Using SIMD will be more costly than just iterating through characters in these common cases.
memchr may be overkill here, because (apart from BigInts), JS numbers can only be a few chars long. memchr is optimized for searching for uncommon needles in large haystacks, but here most commonly you'd have a _ every 3 chars e.g. oneMillion = 1_000_000.
There's likely a cost to initializing a Memchr. It could possibly be speeded up by creating a single finder and reusing it each time, the way the memchr finder for end of multiline comments is here:

oxc/crates/oxc_parser/src/lexer/mod.rs

Lines 98 to 99 in eefb66f

    
               /// `memchr` Finder for end of multi-line comments. Created lazily when first used. 
        
               multi_line_comment_end_finder: Option<memchr::memmem::Finder<'static>>,

But... if we want to optimize this, can I suggest a different approach?

The lexer already iterates through the characters of the source code comprising the number, and deals with _s. At present we do the same check again later, which is unnecessary. We could avoid checking again for _ when converting the number to f64 with something like this:

Add more Kinds e.g. Kind::DecimalWithUnderscores, Kind::BinaryWithUnderscores.
If lexer finds an underscore while lexing the number, it returns Kind::DecimalWithUnderscores instead of Kind::Decimal (in these functions).
Then during conversion to f64, in the majority of cases it'll be a plain Kind::Decimal, and the check for _s can be bypassed entirely, because we already know there isn't one.

@DonIsaac What do you think of that approach?

Also, can I ask out of interest: What was your motivation for getting into this? Were you using Oxc on some input with lots of _s in integers, and found it was slow? Or pursuing optimization for optimization's sake? (either way, thank you!)

DonIsaac · 2024-05-13T15:41:16Z

@overlookmotel Your suggestions look sound. I like the idea of underscore checking during lexing (as per suggestion 2). Since we already need to iterate over numbers there, we could skip the use of memchr entirely.

As for suggestion 1, I like the idea of storing underscore data within the number's Kind, but adding an WithUnderscores variant for each number kind feels verbose to me. What do you think about doing something like using bitflags! for numeric kinds, or some other similar approach?

My motivation for getting into this is that I've spent time optimizing other parts of Oxc before, but never the most important part: the lexer and parser. It's so heavily optimized it's hard to find anything that needs improving. I'm not experiencing any problems with underscores, it's just optimization for optimization's sake 😄

overlookmotel · 2024-05-13T18:54:45Z

@DonIsaac Thanks for coming back.

just optimization for optimization's sake

A man after my own heart!

adding an WithUnderscores variant for each number kind feels verbose to me

Yes, you're right, it is verbose and unwieldy. But there is a perf advantage to keeping Kind simple (simple to the compiler, not us!). At present it's just a single byte, and that's as tight as you can get performance-wise.

What we can do is add explicit discriminants to Kind enum, and make sure there are patterns in the discriminant values so it's actually behaving like a bit flags set under the hood. e.g. make it so that converting any Kind::*WithUnderscores to the corresponding Kind::* is just "subtract 8 from the discriminant" (1 CPU operation).

Personally, my view is that as the lexer is the most primitive level of the stack, and everything else depends on it, it's acceptable to be quite nasty code, as long as it's as fast as possible. But that's just my opinion, feel free to disagree.

It's so heavily optimized it's hard to find anything that needs improving.

I actually see quite a lot of opportunity for further optimization in the lexer. For a start, everywhere we use the Chars iterator, we should iterate over bytes instead. Chars iterator is really slow. #2352 got some heavy perf improvements from doing this (along with other tricks). Secondly, many of the simpler matchers (e.g. when you find a =, is the token =, == or ===?) could be made branchless to avoid the penalty of incorrect branch prediction. I had intended to finish off that work on the lexer, and I expect there's another 5% to 10% gain on parser benchmarks there for the taking, but I've got sidetracked to other things.

If by any chance you'd be interested in taking up that baton, I'd be very happy to talk it through with you.

Boshen · 2024-05-14T02:02:26Z

crates/oxc_parser/src/lexer/number.rs

@@ -96,3 +96,42 @@ pub fn parse_big_int(s: &str, kind: Kind) -> Result<BigInt, &'static str> {
    };
    BigInt::parse_bytes(s.as_bytes(), radix).ok_or("invalid bigint")
 }
+
+#[inline]


This is a deoptimization I think, it will bust the cpu cache for loading so many CPU instructions, #[cold] should be the better option here.

Remember, data oriented design, is thinking about the statistics.

Sorry to contradict you Boshen, but I don't think #[cold] is the right option here either.

#[cold] is for telling the branch predictor "this function is rarely called", and so to assume another arm in a match is more likely to be the one which executes (or same in an if/else branch). But in this case, without_underscores is always called, so #[cold] won't do anything.

If want to make sure the function doesn't get inlined, use #[inline(never)]. Or just don't use any attribute, and let the compiler decide whether to inline or not. The latter is usually the best option.

So I agree with you that #[inline] should be removed here, but probably not replaced with anything else.

FYI, I think I've over-used #[cold] in the lexer, and could probably remove some of them. I know more now than I did in Jan!

Boshen · 2024-05-14T02:05:19Z

numbers are pretty short, and underscores are rare, so this routine may need to be done differently. Feel free to close the PR again 😅

perf(lexer): use SIMD char search when parsing numbers

2cfe56c

DonIsaac added A-parser Area - Parser C-performance Category - Solution not expected to change functional behavior, only performance labels May 12, 2024

DonIsaac closed this May 12, 2024

Boshen requested a review from overlookmotel May 13, 2024 03:14

Boshen reopened this May 13, 2024

Merge branch 'main' into don/perf/lexer-simd-numbers

487a8c7

Boshen reviewed May 14, 2024

View reviewed changes

DonIsaac closed this May 14, 2024

overlookmotel mentioned this pull request May 15, 2024

Convert lexer to iterate over bytes instead of chars #3291

Closed

5 tasks

Boshen deleted the don/perf/lexer-simd-numbers branch September 11, 2024 12:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(lexer): use SIMD char search when parsing numbers #3250

perf(lexer): use SIMD char search when parsing numbers #3250

DonIsaac commented May 12, 2024

graphite-app bot commented May 12, 2024

codspeed-hq bot commented May 12, 2024 •

edited

Loading

DonIsaac commented May 12, 2024

Boshen commented May 13, 2024

overlookmotel commented May 13, 2024 •

edited

Loading

DonIsaac commented May 13, 2024

overlookmotel commented May 13, 2024 •

edited

Loading

Boshen May 14, 2024

overlookmotel May 14, 2024 •

edited

Loading

Boshen commented May 14, 2024

perf(lexer): use SIMD char search when parsing numbers #3250

perf(lexer): use SIMD char search when parsing numbers #3250

Conversation

DonIsaac commented May 12, 2024

What This PR Does

graphite-app bot commented May 12, 2024

Your org has enabled the Graphite merge queue for merging into main

codspeed-hq bot commented May 12, 2024 • edited Loading

CodSpeed Performance Report

Merging #3250 will not alter performance

Summary

DonIsaac commented May 12, 2024

Boshen commented May 13, 2024

overlookmotel commented May 13, 2024 • edited Loading

DonIsaac commented May 13, 2024

overlookmotel commented May 13, 2024 • edited Loading

Boshen May 14, 2024

Choose a reason for hiding this comment

overlookmotel May 14, 2024 • edited Loading

Choose a reason for hiding this comment

Boshen commented May 14, 2024

codspeed-hq bot commented May 12, 2024 •

edited

Loading

overlookmotel commented May 13, 2024 •

edited

Loading

overlookmotel commented May 13, 2024 •

edited

Loading

overlookmotel May 14, 2024 •

edited

Loading