-
-
Notifications
You must be signed in to change notification settings - Fork 910
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: unsafe stringify now uses concatenation #676
Conversation
This is more in line with the sort of change I'd be willing to consider (relatively modest impact on codebase). Running the benchmarks on my own system, for both (You can run the benchmarks in the browser via
Do you have a link or article explaining why the @ctavan @Rush @LinusU : Thoughts? I'm on the fence here. This is a ~8% perf gain, with minimal code change. But as I said to @Cadienvan in his other PR, I'm increasingly leery of making changes simply for the sake of performance. |
I'm sorry I can't find any specific article about it, yet I'm sure it works like that as I struggled with some performance issues on a Node application I was working on and this solved the issue. I had a look (We are talking about more than a year ago) at some source code but can't find it right now. Here's a proper explanation of the phenomenon: In JavaScript, the + operator is used for concatenation when talking about strings. Concatenation involves creating a new string by combining the two operands ( On the other hand, the I'm pretty sure using something like node-clinic can provide some information about it, but never used it so I can't tell for sure. |
I'd be curious to know how this change impacts perf on non-Node (V8) runtimes. |
I have to be honest, tried some popular browsers with small benchmarks out of curiosity, couldn't find anything relevant as whichever engine is involved, the worst it can happen is that both operations create a new string as performances always got better or stayed equal, never gone down. |
@broofa I will share my thoughts since I am highlighted and I did make performance contributions in the past :) Software is not made in a vacuum and it does not run on abstract machines but on real hardware and in terms of JS it runs on larger software stacks (V8, Mozilla). Just like the Linux kernel optimizes its code to run more efficiently on real, tangible hardware it is our responsibility as JS developers to write code that runs efficiently on JS engines that people use in real life. So yes, an 8% improvement - if confirmed on multiple hardware/JS engine setups sounds meaningful enough to invest in. |
It seems like this is ~27% slower on Safari 16.2 on an M1 MacBook Air. On Chrome I've created a test case that anyone can run on JSBench: |
It isn't the case on my M1 Max (Chrome 108.0.5359.124 as yours) where the new version is faster, yet of course I'll accept whatever you decide. |
Thanks @LinusU. I ran your test case on a variety of browsers using BrowserStack. Results below. Note: JSBench and the Edge, Chrome, Yandex, and Opera all use Chromium (and therefore V8, I assume?). Firefox uses SpiderMonkey. Safari uses Nitro. This benchmark is consistently slower on all platforms. Which is ... interesting, since I gotta get to work here, but I'd like to understand why we seem to be getting different results between JSBench and the built-in test before moving forward with this. Safari - 26.8% slower. 'Tested on my laptop because BrowserStack's Safari refused to load the JSBench site for some reason: Yandex - Didn't load on BrowserStack, and I'm not about to install it on my laptop. Uses Chromium though, so 🤷 Opera - 2% slower, not providing a screenshot as I don't consider it a supported platform. Uses Chromium, too, so 🤷 |
I guess Node acts differently than the V8 engine itself, it's the only thing I can think of. Now the question is: how impactful is the browser usage in such a library? |
btw. I don't think that this is correct:
I'm personally surprised that the new code is faster (if it is), since I'd assume that it'd be easier for the compiler/runtime to optimize the older code. Not that it shouldn't be able to optimize both codes into the same instructions for the computer... That it isn't a question of modifying in place or not can be seen in this simple example:
Strings are treated as "primitives" in JavaScript, which means that the runtime will have to use a "copy on write" scheme and keep track of which strings can be modified in place, and which must first be copied to a new location in the memory. That being said, I believe that the only way to know is to measure so it doesn't really matter 😄 When seeing the initial post I was already a bit skeptical about the results/our benchmarks since some numbers that shouldn't have changed changed (e.g. So I brought out my favourite benchmark tool,
// Code from "Setup JavaScript"
for (let i = 0; i < 4_000_000; i++) {
unsafeStringifyV1(arr0)
unsafeStringifyV1(arr1)
unsafeStringifyV1(arr2)
unsafeStringifyV1(arr3)
unsafeStringifyV1(arr4)
unsafeStringifyV1(arr5)
unsafeStringifyV1(arr6)
unsafeStringifyV1(arr7)
unsafeStringifyV1(arr8)
unsafeStringifyV1(arr9)
}
// Code from "Setup JavaScript"
for (let i = 0; i < 4_000_000; i++) {
unsafeStringifyV2(arr0)
unsafeStringifyV2(arr1)
unsafeStringifyV2(arr2)
unsafeStringifyV2(arr3)
unsafeStringifyV2(arr4)
unsafeStringifyV2(arr5)
unsafeStringifyV2(arr6)
unsafeStringifyV2(arr7)
unsafeStringifyV2(arr8)
unsafeStringifyV2(arr9)
} Then I ran this with Node.js v18.12.1 on my arm64 M1 MacBook Air: $ hyperfine --warmup 1 'node v1.js' 'node v2.js'
Benchmark 1: node v1.js
Time (mean ± σ): 5.424 s ± 0.202 s [User: 5.264 s, System: 0.055 s]
Range (min … max): 5.313 s … 5.993 s 10 runs
Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet PC without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.
Benchmark 2: node v2.js
Time (mean ± σ): 5.964 s ± 0.007 s [User: 5.794 s, System: 0.058 s]
Range (min … max): 5.953 s … 5.974 s 10 runs
Summary
'node v1.js' ran
1.10 ± 0.04 times faster than 'node v2.js' Note that there were statistical outliers even with a warmup. I believe that this is because I'm running this on my laptop on battery 😅 I did run this multiple time (changing
It would be great to have some numbers, but I think that there are many many people using this in browsers. In fact, since all LTS versions of Node.js now have I also run the
Stringify is marginally slower, but then "uuid.v4() fill existing array" is a lot slower so that's a bit strange 🤔 And then |
This is starting to become a scientific paper ahah I like it, it seems like even specific CPUs or combinations of specs are influencing this benchmark as running those again on my pc still seems to prove the changed version is faster. |
Haha, yeah, it's amazing! Would you be able to run the I also found another possible code change whilst looking at this, and some very interesting behaviour regarding |
Thanks @LinusU. As always, your insight is very helpful. Note: I think you mixed up the output in your "simple example", above: > a
'hello, world!' // <-- should be 'hello'
> b
'hello' // <-- should be 'hello, world!' |
Argh, yeah I did, thanks for pointing it out. I originally wrote it wrong in my terminal, and then figured I'd just fix it in the terminal, but I should have re-run it! 😅 I've updated the comment! |
@LinusU I'd love to, are you able to provide the exact gist you tested so we can do some tests? |
@mcollina this would be a great stream topic! |
@Cadienvan, my apologies for letting this stagnate. Having just had to revert a prior change to this code due to the impact on memory allocation, I'm going to say that the code as it stands is "good enough" and try to avoid further bit-swizzling on it. |
The only change made was moving away from the + operator, using string concatenation instead.
An invisible, yet valuable change IMO.
The explanation is pretty simple: The + operator creates a new string on each operation, which can be inefficient when concatenating a large number of strings, or in this case, concatenating different "chunks" multiple times.
@broofa I just saw your comment on my other pull request, yet I was already preparing this PR, so please let me know if this can be valid or performance isn't taken in consideration at all even in these cases, I'll stop going for it if it is the case ofc :)
Before:
After: