-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
metron select row is a huge outlier #1317
Comments
It was a lucky run, you can see in the chrome 116 results that it's no longer as much of an outlier. I suspect if the amount of runs for select row were increased many of these faster than vanilla implementations would normalise to at best be on par. There should be no bug giving an advantage, but please let me know if you can find one. |
It does appear to have happened again in the latest bench. When I run locally it never exceeds Vanilla perf. On the public results I also notice a significant difference between VanillaJS-1 and VanillaJS, difference there should not be that large. Also looking back on previous results Select Row seems to be consistently inconsistent. I do really like the idea of recomputing baselines when a subset of frameworks are selected. Another thing that may reduce the inconsistencies with select row bench, is change it to benchmark selecting x (maybe even all the way to 1k) rows per run. I can experiment with that, but as I said I'm not able to repro the inconsistency locally. @leeoniya have you noticed it on local runs or just the public results? |
i havent done a local run in quite a long time.
that sounds a lot better to me than using an artificial 16x slowdown. maybe select 20 rows in sequence and take the average or median of that. |
I think it would be hard to get average or median of the internal sequences without some significant changes to the bench runner code. You might be spot on about 16x slowdown causing the issue. "Partial Update" seems to exhibit these same inconsistencies on the public results. It's less noticeable at a glance, because of higher base MS, but the actual amount of variance seems to be about the same. Could also try reducing slowdown to 4x and operating on 10k items instead of 1k. |
i always found the 10k metric to be pretty absurd (it's like an 80k DOM node page that would realistically be lazy-rendered / paginated). most of the timings there will be even more dominated by browser layout/repaint and dont really exhibit framework differences any better than what the 1k already show. |
I'm taking a look at it and I'll post (and update) my findings here... metron values from last run: vanillajs-1 results: |
parseTrace.ts prints the following: analyzing trace traces/vanillajs-1-keyed_04_select1k_0.json |
I changed the duration computation to compute the duration from the click event to the end of the last paint or commit event. I'm looking forward to getting feedback! |
Do you have a branch I could check out? I'm not sure why commit would take any longer. The only difference is I use setAttribute instead of className (planning to switch). Also not able to repro the difference locally. The table does look more stable at first glance though. |
Sure here it is: https://github.com/krausest/js-framework-benchmark/tree/duration-logic-1317 |
I'm seeing long commits on pretty much all frameworks I've tested, seems totally random. Here's the trace log I get for vanillajs:
|
I've taken a deep look at it and changed the duration computation. It's now explained on https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured and the code has been updated (on master, the branch served as a prototype). I'm looking forward to getting feedback for both the table and the wiki page and plan to update results for chrome 118 to the new scheme. From a first look the result table seems to make no big difference for the ranking (which would be good). The minor changes for the ranking seems to be mostly influenced by the select row benchmark. |
This looks it will reduce "lucky runs" from disproportionately affecting results. There's still bound to be variance for the top performing frameworks between runs because of the limited iterations and Chromes own timing inconsistencies. Top 10-20 frameworks are mostly triggering identical DOM interactions. I know it goes against the goal of showing full client side duration, but it would be a nice feature to show the JS duration somewhere in the table (under a dropdown like boxplot). It would benefit framework authors to help identify the bottlenecks in comparison to the other top performing frameworks. I agree with the focus being on real world render timings rather than JS execution. Chromes inconsistencies just make the data less valuable (to me when focusing on improving performance) when the race is so close. |
I'm closing this issue. Results for chrome 118 will compute duration in a slightly different way: #1411 https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured |
@krausest the preliminary results for 118 still have this issue. 1.09 is the second fastest, then 1.15 and 1.16. i think measuring the duration of a single action with huge slowdown is gonna be frought with these inconsistencies. maybe consider no slowdown and just measuring the duration of 10 row selections summed. |
I think the order of benchmarks should be changed from:
To:
That way if there are sustained CPU fluctuations (Due to thermal throttling or other tasks) it will have less of an impact to the overall comparison per benchmark row. |
I don't think I can measure the time for 10 interactions reliably. select row is really too fast. Still it holds an important information: Is the framework clever enough to repaint just the affected rows or not. And there are enough frameworks that are really slow for that job and where the difference even in this benchmark is significant: I really think a cap would serve that purpose best. I'd say let's pick a value (maybe 20 msecs) and cap all values below 20 msecs. Update: I tried to reproduce the vanjs run but failed to get such fast results. There was no such 9 msec result and mean was about 14 msecs. @robbiespeed I would have agreed for my i7 razer blade, but the macbook pro is IMHO not prone to throttling in this benchmark. I hadn't even seen throttling on the macbook air #885 (comment) |
results that fluctuate by 10%, like |
we can say baseline is fastest result with < 5% fluctuation. so something like or 3%, or whatever... |
I tried 4x slowdown for select rows: |
maybe consider doing the same for partial update? |
Here's my suggestion:
The reason for 3 is: The slowest implementation is 3.28 slower than vanillajs for create rows, but 48 times slower for select rows. Maybe we should adjust for that and make sure that the influence of each benchmark is more comparable.
and then we use 1/factor as the weight for the weighted geometric mean. Taking the 90% percentile feels a bit more stable considering @birkskyum want to eliminate choo :) I've attached an excel where you can see the impact and play with the weights. Here's an example what impact a 10% increase in duration has on the results: P.S: We could go up with the number of iterations in #2 easily if someone had a spare mac mini 😄 |
Regarding the suggestion to archive choo, forgo, bdc - it's both a budgeting exercise, because there are other interesting frameworks that I'd love to add to the benchmark, such as preact, and likely svelte 5 runes in a month, and letting go of some old things seemed like a natural evolution to keep costs down and maintain iterations per. framework. I also tried to update forgo, but I got some weird results. Removing some very slow frameworks, could make it easier to compare the rest, and help make the js-framework-bench yet again relevant to the present day frameworks. I see only now that choo got a fair bit of attention when it was news, which I didn't know. |
Here's what I suggest for the chome 118 results (slightly adapted form above):
https://krausest.github.io/js-framework-benchmark/2023/table_chrome_118_preliminary3.html And here's the excel if you want to compare and see the weights: |
The weight for a benchmark = 1 / (factor for 90% percentile of the benchmark)
So this time it's dervied from the actual 90% percentile, but I'd fix those weights for the next few runs. |
i'm on board with whatever reduces the big ± variance of the baseline (and smooths out the factor progression of the fast runs for each metric). 👍 |
Implemented since chrome 118. |
every other metric when sorted by fastest to slowest follows a predictable pattern of 1.00 followed by a few 1.01 or 1.02 values.
but metron select row is somehow 13% faster than the next fastest (8.4 vs 9.5). it's the only metric with this stat and penalizes every other framework in this bench.
i wonder if something's not being measured correctly here.
it would be useful to manually select or set the baseline for any metric in the ui. also a mode to re-compute the baseline when a framework is disabled.
The text was updated successfully, but these errors were encountered: