Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metron select row is a huge outlier #1317

Closed
leeoniya opened this issue Jul 22, 2023 · 30 comments
Closed

metron select row is a huge outlier #1317

leeoniya opened this issue Jul 22, 2023 · 30 comments

Comments

@leeoniya
Copy link
Contributor

leeoniya commented Jul 22, 2023

every other metric when sorted by fastest to slowest follows a predictable pattern of 1.00 followed by a few 1.01 or 1.02 values.

but metron select row is somehow 13% faster than the next fastest (8.4 vs 9.5). it's the only metric with this stat and penalizes every other framework in this bench.

i wonder if something's not being measured correctly here.

it would be useful to manually select or set the baseline for any metric in the ui. also a mode to re-compute the baseline when a framework is disabled.

@robbiespeed
Copy link
Contributor

It was a lucky run, you can see in the chrome 116 results that it's no longer as much of an outlier. I suspect if the amount of runs for select row were increased many of these faster than vanilla implementations would normalise to at best be on par.

There should be no bug giving an advantage, but please let me know if you can find one.

@robbiespeed
Copy link
Contributor

It does appear to have happened again in the latest bench. When I run locally it never exceeds Vanilla perf.

Screenshot from 2023-09-18 14-27-57

On the public results I also notice a significant difference between VanillaJS-1 and VanillaJS, difference there should not be that large. Also looking back on previous results Select Row seems to be consistently inconsistent.

I do really like the idea of recomputing baselines when a subset of frameworks are selected.

Another thing that may reduce the inconsistencies with select row bench, is change it to benchmark selecting x (maybe even all the way to 1k) rows per run. I can experiment with that, but as I said I'm not able to repro the inconsistency locally.

@leeoniya have you noticed it on local runs or just the public results?

@leeoniya
Copy link
Contributor Author

leeoniya commented Sep 18, 2023

@leeoniya have you noticed it on local runs or just the public results?

i havent done a local run in quite a long time.

Another thing that may reduce the inconsistencies with select row bench, is change it to benchmark selecting x (maybe even all the way to 1k) rows per run. I can experiment with that, but as I said I'm not able to repro the inconsistency locally.

that sounds a lot better to me than using an artificial 16x slowdown. maybe select 20 rows in sequence and take the average or median of that.

@robbiespeed
Copy link
Contributor

I think it would be hard to get average or median of the internal sequences without some significant changes to the bench runner code.

You might be spot on about 16x slowdown causing the issue. "Partial Update" seems to exhibit these same inconsistencies on the public results. It's less noticeable at a glance, because of higher base MS, but the actual amount of variance seems to be about the same.

Could also try reducing slowdown to 4x and operating on 10k items instead of 1k.

@leeoniya
Copy link
Contributor Author

leeoniya commented Sep 18, 2023

i always found the 10k metric to be pretty absurd (it's like an 80k DOM node page that would realistically be lazy-rendered / paginated). most of the timings there will be even more dominated by browser layout/repaint and dont really exhibit framework differences any better than what the 1k already show.

@krausest
Copy link
Owner

I'm taking a look at it and I'll post (and update) my findings here...

metron values from last run:
"min":7.441,"max":12.477,"mean":9.6704,"median":9.161999999999999,"stddev":1.55860886690664
[7.441,7.932,9.075,9.085,9.159,9.165,9.993,11.006,11.371,12.477]

vanillajs-1 results:
"min":8.022,"max":12.074,"mean":9.883099999999999,"median":9.593,"stddev":1.4121499330217502
[8.022,8.413,8.689,9.258,9.508,9.678,10.249,11.05,11.89,12.074]

@krausest
Copy link
Owner

parseTrace.ts prints the following:
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_0.json
traces/metron-v0.0.2-keyed_04_select1k_0.json { tsStart: 205422475737, tsEnd: 205422489518, duration: 13.781 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_1.json
traces/metron-v0.0.2-keyed_04_select1k_1.json { tsStart: 205425223287, tsEnd: 205425234293, duration: 11.006 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_2.json
traces/metron-v0.0.2-keyed_04_select1k_2.json { tsStart: 205428755850, tsEnd: 205428765009, duration: 9.159 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_3.json
traces/metron-v0.0.2-keyed_04_select1k_3.json { tsStart: 205432328203, tsEnd: 205432340680, duration: 12.477 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_4.json
traces/metron-v0.0.2-keyed_04_select1k_4.json { tsStart: 205435914118, tsEnd: 205435922050, duration: 7.932 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_5.json
traces/metron-v0.0.2-keyed_04_select1k_5.json { tsStart: 205439488914, tsEnd: 205439498907, duration: 9.993 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_6.json
traces/metron-v0.0.2-keyed_04_select1k_6.json { tsStart: 205442923168, tsEnd: 205442930609, duration: 7.441 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_7.json
traces/metron-v0.0.2-keyed_04_select1k_7.json { tsStart: 205446179113, tsEnd: 205446188198, duration: 9.085 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_8.json
traces/metron-v0.0.2-keyed_04_select1k_8.json { tsStart: 205449838399, tsEnd: 205449849770, duration: 11.371 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_9.json
traces/metron-v0.0.2-keyed_04_select1k_9.json { tsStart: 205453105222, tsEnd: 205453114387, duration: 9.165 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_10.json
traces/metron-v0.0.2-keyed_04_select1k_10.json { tsStart: 205456234013, tsEnd: 205456243088, duration: 9.075 }
analyzing trace traces/metron-v0.0.2-keyed_04_select1k_11.json
traces/metron-v0.0.2-keyed_04_select1k_11.json { tsStart: 205459859891, tsEnd: 205459873057, duration: 13.166 }

analyzing trace traces/vanillajs-1-keyed_04_select1k_0.json
traces/vanillajs-1-keyed_04_select1k_0.json { tsStart: 229062905113, tsEnd: 229062917379, duration: 12.266 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_1.json
traces/vanillajs-1-keyed_04_select1k_1.json { tsStart: 229066074531, tsEnd: 229066083789, duration: 9.258 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_2.json
traces/vanillajs-1-keyed_04_select1k_2.json { tsStart: 229069312435, tsEnd: 229069321124, duration: 8.689 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_3.json
traces/vanillajs-1-keyed_04_select1k_3.json { tsStart: 229071640361, tsEnd: 229071649869, duration: 9.508 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_4.json
traces/vanillajs-1-keyed_04_select1k_4.json { tsStart: 229074002485, tsEnd: 229074012163, duration: 9.678 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_5.json
traces/vanillajs-1-keyed_04_select1k_5.json { tsStart: 229077352063, tsEnd: 229077362312, duration: 10.249 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_6.json
traces/vanillajs-1-keyed_04_select1k_6.json { tsStart: 229080669461, tsEnd: 229080681880, duration: 12.419 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_7.json
traces/vanillajs-1-keyed_04_select1k_7.json { tsStart: 229083929232, tsEnd: 229083937645, duration: 8.413 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_8.json
traces/vanillajs-1-keyed_04_select1k_8.json { tsStart: 229087190047, tsEnd: 229087201937, duration: 11.89 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_9.json
traces/vanillajs-1-keyed_04_select1k_9.json { tsStart: 229090437375, tsEnd: 229090445397, duration: 8.022 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_10.json
traces/vanillajs-1-keyed_04_select1k_10.json { tsStart: 229093300948, tsEnd: 229093311998, duration: 11.05 }
analyzing trace traces/vanillajs-1-keyed_04_select1k_11.json
traces/vanillajs-1-keyed_04_select1k_11.json { tsStart: 229096206027, tsEnd: 229096218101, duration: 12.074 }

@krausest
Copy link
Owner

Here's the graph for the fastest metron:
Screenshot 2023-09-20 at 9 48 37 PM
and for vanillajs-1:
Screenshot 2023-09-20 at 9 51 01 PM

Currently the benchmark only stops the time until the paint event, but obviously in this case the commit phase and layout phase differ by a large margin. Please note both have a layout and a commit phase, but their length differs a lot.

@krausest
Copy link
Owner

I changed the duration computation to compute the duration from the click event to the end of the last paint or commit event.
Here's the prototype:
https://krausest.github.io/js-framework-benchmark/2023/table_use_commit.html

I'm looking forward to getting feedback!

@robbiespeed
Copy link
Contributor

Do you have a branch I could check out? I'm not sure why commit would take any longer. The only difference is I use setAttribute instead of className (planning to switch). Also not able to repro the difference locally.

The table does look more stable at first glance though.

@krausest
Copy link
Owner

Sure here it is: https://github.com/krausest/js-framework-benchmark/tree/duration-logic-1317
The way I'm analyzing those traces is in webdriver-ts via npm run compile && node dist/parseTrace.js and play with parseTrace. It uses the trace files captures from the last run and just extracts the duration values. readAll() goes over all results and writes new result JSON files such that you can create the result table afterwards, calling single() from main allows to recompute selected implementations without updating the results.

@robbiespeed
Copy link
Contributor

I'm seeing long commits on pretty much all frameworks I've tested, seems totally random. Here's the trace log I get for vanillajs:

analyzing trace  traces/vanillajs-keyed_04_select1k_0.json
LOOOOOONG COMMIT  1636
LOOOOOONG COMMIT  3216
event click 0 - 3514 {"args":{"data":{"type":"click"}},"cat":"devtools.timeline","dur":3514,"name":"EventDispatch","ph":"X","pid":65114,"tdur":3505,"tid":1,"ts":47165045719,"tts":278129}
event paint 5532 - 12461 {"args":{"data":{"clip":[-8388608,-8388608,8388607,-8388608,8388607,8388607,-8388608,8388607],"frame":"3A8A5B121563FE6E83D3F5A5DCB860BB","layerId":0,"nodeId":1}},"cat":"devtools.timeline,rail","dur":6929,"name":"Paint","ph":"X","pid":65114,"tdur":6820,"tid":1,"ts":47165051251,"tts":283640}
event commit 16277 - 19493 {"args":{"frameSeqId":141,"layerTreeId":1},"cat":"disabled-by-default-devtools.timeline","dur":3216,"name":"Commit","ph":"X","pid":65114,"tdur":3089,"tid":1,"ts":47165061996,"tts":294270}
DEBUG: searching for commit event after {
  type: 'click',
  ts: 47165045719,
  dur: 3514,
  end: 47165049233,
  pid: 65114,
  evt: '{"args":{"data":{"type":"click"}},"cat":"devtools.timeline","dur":3514,"name":"EventDispatch","ph":"X","pid":65114,"tdur":3505,"tid":1,"ts":47165045719,"tts":278129}'
} for traces/vanillajs-keyed_04_select1k_0.json
traces/vanillajs-keyed_04_select1k_0.json {
  tsStart: 47165045719,
  tsEnd: 47165065212,
  duration: 19.493,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
done

@krausest
Copy link
Owner

krausest commented Sep 26, 2023

I've taken a deep look at it and changed the duration computation. It's now explained on https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured and the code has been updated (on master, the branch served as a prototype).
I've put the results table here: https://krausest.github.io/js-framework-benchmark/2023/table_use_commit2.html

I'm looking forward to getting feedback for both the table and the wiki page and plan to update results for chrome 118 to the new scheme.

From a first look the result table seems to make no big difference for the ranking (which would be good). The minor changes for the ranking seems to be mostly influenced by the select row benchmark.

@robbiespeed
Copy link
Contributor

robbiespeed commented Sep 27, 2023

This looks it will reduce "lucky runs" from disproportionately affecting results. There's still bound to be variance for the top performing frameworks between runs because of the limited iterations and Chromes own timing inconsistencies. Top 10-20 frameworks are mostly triggering identical DOM interactions.

I know it goes against the goal of showing full client side duration, but it would be a nice feature to show the JS duration somewhere in the table (under a dropdown like boxplot). It would benefit framework authors to help identify the bottlenecks in comparison to the other top performing frameworks.

I agree with the focus being on real world render timings rather than JS execution. Chromes inconsistencies just make the data less valuable (to me when focusing on improving performance) when the race is so close.

@krausest
Copy link
Owner

krausest commented Oct 13, 2023

I'm closing this issue. Results for chrome 118 will compute duration in a slightly different way: #1411

https://github.com/krausest/js-framework-benchmark/wiki/How-the-duration-is-measured

@leeoniya
Copy link
Contributor Author

leeoniya commented Oct 13, 2023

@krausest the preliminary results for 118 still have this issue. 1.09 is the second fastest, then 1.15 and 1.16.

Screenshot_20231013-090826748_1

i think measuring the duration of a single action with huge slowdown is gonna be frought with these inconsistencies. maybe consider no slowdown and just measuring the duration of 10 row selections summed.

@krausest
Copy link
Owner

I'll take a look at it. The new algorithm says for vanjs:

(base) stefan@MBP-von-Stefan webdriver-ts % npm run compile && node dist/parseTrace.js

> webdriver-ts@1.0.0 compile
> tsc

analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_0.json
traces/vanjs-v1.1.0-keyed_04_select1k_0.json {
  tsStart: 721301952261,
  tsEnd: 721301969705,
  duration: 17.444,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_1.json
traces/vanjs-v1.1.0-keyed_04_select1k_1.json {
  tsStart: 721305151359,
  tsEnd: 721305160615,
  duration: 9.256,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_2.json
traces/vanjs-v1.1.0-keyed_04_select1k_2.json {
  tsStart: 721308303448,
  tsEnd: 721308314555,
  duration: 11.107,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_3.json
traces/vanjs-v1.1.0-keyed_04_select1k_3.json {
  tsStart: 721311549624,
  tsEnd: 721311559368,
  duration: 9.744,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_4.json
traces/vanjs-v1.1.0-keyed_04_select1k_4.json {
  tsStart: 721314453435,
  tsEnd: 721314466953,
  duration: 13.518,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_5.json
traces/vanjs-v1.1.0-keyed_04_select1k_5.json {
  tsStart: 721317915992,
  tsEnd: 721317929608,
  duration: 13.616,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_6.json
traces/vanjs-v1.1.0-keyed_04_select1k_6.json {
  tsStart: 721321336607,
  tsEnd: 721321347006,
  duration: 10.399,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_7.json
traces/vanjs-v1.1.0-keyed_04_select1k_7.json {
  tsStart: 721324351356,
  tsEnd: 721324363431,
  duration: 12.075,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_8.json
traces/vanjs-v1.1.0-keyed_04_select1k_8.json {
  tsStart: 721327693098,
  tsEnd: 721327704206,
  duration: 11.108,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_9.json
traces/vanjs-v1.1.0-keyed_04_select1k_9.json {
  tsStart: 721330915636,
  tsEnd: 721330929029,
  duration: 13.393,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_10.json
traces/vanjs-v1.1.0-keyed_04_select1k_10.json {
  tsStart: 721334310723,
  tsEnd: 721334320084,
  duration: 9.361,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/vanjs-v1.1.0-keyed_04_select1k_11.json
traces/vanjs-v1.1.0-keyed_04_select1k_11.json {
  tsStart: 721337771501,
  tsEnd: 721337788371,
  duration: 16.87,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}

The slowest two get dropped, such that we get [9.256, 9.361, 9.744, 10.399, 11.107, 11.108, 12.075, 13.393, 13.518, 13.616].

For sinuous we get:

traces/sinuous-v0.32.1-keyed_04_select1k_5.json {
  tsStart: 713393996671,
  tsEnd: 713394008380,
  duration: 11.709,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_6.json
traces/sinuous-v0.32.1-keyed_04_select1k_6.json {
  tsStart: 713397807816,
  tsEnd: 713397818043,
  duration: 10.227,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_7.json
traces/sinuous-v0.32.1-keyed_04_select1k_7.json {
  tsStart: 713401566673,
  tsEnd: 713401587266,
  duration: 20.593,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_8.json
traces/sinuous-v0.32.1-keyed_04_select1k_8.json {
  tsStart: 713405379663,
  tsEnd: 713405390250,
  duration: 10.587,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_9.json
traces/sinuous-v0.32.1-keyed_04_select1k_9.json {
  tsStart: 713409133922,
  tsEnd: 713409146386,
  duration: 12.464,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_10.json
traces/sinuous-v0.32.1-keyed_04_select1k_10.json {
  tsStart: 713412737933,
  tsEnd: 713412753592,
  duration: 15.659,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}
analyzing trace  traces/sinuous-v0.32.1-keyed_04_select1k_11.json
traces/sinuous-v0.32.1-keyed_04_select1k_11.json {
  tsStart: 713416457112,
  tsEnd: 713416472334,
  duration: 15.222,
  layouts: 0,
  raf_long_delay: 0,
  droppedNonMainProcessCommitEvents: false,
  droppedNonMainProcessOtherEvents: false,
  maxDeltaBetweenCommits: 0,
  numberCommits: 1
}

[9.262, 10.227, 10.587, 11.259, 11.709, 12.464, 13.304, 15.222, 15.659, 16.571]
Here's the boxplot:
Screenshot 2023-10-13 at 4 34 40 PM
Checking some the fastest result for vanjs:
Screenshot 2023-10-13 at 4 40 18 PM
9.2something looks good.

Checking the slowest run for sinuous:
Screenshot 2023-10-13 at 4 42 52 PM
manual selection is close to the 16.571 reported for it.

So I can't currently see a mistake in the measuring. Seems like chrome makes a pause for almost 5 msecs for sinuous:
Screenshot 2023-10-13 at 4 46 38 PM

I currently agree that results aren't pretty but I see no issue on the duration computation logic side.

@robbiespeed
Copy link
Contributor

I think the order of benchmarks should be changed from:

Bench A Framework A, Bench B Framework A, ...
Bench A Framework B, Bench B Framework B, ...

To:

Bench A Framework A, Bench A Framework B, ...
Bench B Framework A, Bench B Framework B, ...

That way if there are sustained CPU fluctuations (Due to thermal throttling or other tasks) it will have less of an impact to the overall comparison per benchmark row.

@krausest
Copy link
Owner

krausest commented Oct 13, 2023

I don't think I can measure the time for 10 interactions reliably.
Currently all benchmarks are designed to be a single interaction such that interaction can be found in the timeline and the duration be measured. If I changed it to measure the duration for ten interactions I'd also measure latencies between the test driver and chrome. The test driver would have to wait for the selected row to show up and then perform the next click. I assume that waiting for a condition would introduce a large latency.

select row is really too fast. Still it holds an important information: Is the framework clever enough to repaint just the affected rows or not. And there are enough frameworks that are really slow for that job and where the difference even in this benchmark is significant:

Screenshot 2023-10-13 at 5 01 23 PM

I really think a cap would serve that purpose best. I'd say let's pick a value (maybe 20 msecs) and cap all values below 20 msecs.

Update: I tried to reproduce the vanjs run but failed to get such fast results. There was no such 9 msec result and mean was about 14 msecs.

@robbiespeed I would have agreed for my i7 razer blade, but the macbook pro is IMHO not prone to throttling in this benchmark. I hadn't even seen throttling on the macbook air #885 (comment)

@krausest krausest reopened this Oct 13, 2023
@leeoniya
Copy link
Contributor Author

leeoniya commented Oct 13, 2023

results that fluctuate by 10%, like 11.1 ± 1.1 are not really useful. we can do some kind of smart clamping for each metric's baseline that takes this into account.

@leeoniya
Copy link
Contributor Author

leeoniya commented Oct 13, 2023

we can say baseline is fastest result with < 5% fluctuation. so something like 12.9 ± 0.5 would be much closer.

or 3%, or whatever...

@krausest
Copy link
Owner

I tried 4x slowdown for select rows:
https://krausest.github.io/js-framework-benchmark/2023/table_chrome_118_preliminary2.html
Looks better, but I guess we should perform some more runs to check if we can count on it.

@leeoniya
Copy link
Contributor Author

I tried 4x slowdown for select rows:

maybe consider doing the same for partial update?

@krausest
Copy link
Owner

Here's my suggestion:

  1. For select rows we're running more than 10 iterations. Maybe 20 might still be bearable.
  2. We stop dropping the two slowest runs and use 10 iterations for all other benchmarks.
  3. We stop using the geometric mean and use a weighted geometric mean instead (https://en.wikipedia.org/wiki/Weighted_geometric_mean)

The reason for 3 is: The slowest implementation is 3.28 slower than vanillajs for create rows, but 48 times slower for select rows. Maybe we should adjust for that and make sure that the influence of each benchmark is more comparable.
The way to do this could be: We take the factor for the 90% percentile for each benchmark like that:

01_run1k: fastest 38.6075 90% percentile 58.93565000000003 factor 1.5265337045910776
02_replace1k: fastest 39.097 90% percentile 69.12685000000005 factor 1.7680857866332467
03_update10th1k_x16: fastest 80.58250000000001 90% percentile 157.48480000000018 factor 1.9543300344367593
04_select1k: fastest 3.227 90% percentile 17.120600000000003 factor 5.305422993492409
05_swap1k: fastest 22.61 90% percentile 170.42445 factor 7.537569659442725
06_remove-one-1k: fastest 35.0215 90% percentile 66.91905000000011 factor 1.910799080564799
07_create10k: fastest 381.80150000000003 90% percentile 686.5504500000001 factor 1.7981868850698597
08_create1k-after1k_x2: fastest 83.66149999999999 90% percentile 152.62940000000003 factor 1.8243684370947215
09_clear1k_x8: fastest 25.430500000000002 90% percentile 60.575350000000014 factor 2.3819960283911055

and then we use 1/factor as the weight for the weighted geometric mean. Taking the 90% percentile feels a bit more stable considering @birkskyum want to eliminate choo :)

I've attached an excel where you can see the impact and play with the weights.
The tab results-weighted is using the approach above and results-geom is using the old approach. The column T contains the new result value.

Here's an example what impact a 10% increase in duration has on the results:
Screenshot 2023-10-14 at 6 11 54 PM
So for the old geometric mean rule it has the same effect on the result, but the new weighted geometric mean is less sensitive for the increase in select rows.

P.S: We could go up with the number of iterations in #2 easily if someone had a spare mac mini 😄
results.xlsx

@birkskyum
Copy link
Contributor

birkskyum commented Oct 14, 2023

Regarding the suggestion to archive choo, forgo, bdc - it's both a budgeting exercise, because there are other interesting frameworks that I'd love to add to the benchmark, such as preact, and likely svelte 5 runes in a month, and letting go of some old things seemed like a natural evolution to keep costs down and maintain iterations per. framework. I also tried to update forgo, but I got some weird results. Removing some very slow frameworks, could make it easier to compare the rest, and help make the js-framework-bench yet again relevant to the present day frameworks. I see only now that choo got a fair bit of attention when it was news, which I didn't know.

@krausest
Copy link
Owner

Here's what I suggest for the chome 118 results (slightly adapted form above):

  1. 15 runs for all CPU benchmarks except select row, which uses 25 runs. No runs are dropped.
  2. Use much lower slow down factors.
  3. Use weighted geometric mean
  4. Just a single lighthouse run (I must somehow cut down time until someone donates that mac mini 😄)

https://krausest.github.io/js-framework-benchmark/2023/table_chrome_118_preliminary3.html

And here's the excel if you want to compare and see the weights:

results.xlsx

@birkskyum
Copy link
Contributor

birkskyum commented Oct 15, 2023

Where does these weights in the top come from?

Screenshot 2023-10-15 at 16 26 31

@krausest
Copy link
Owner

krausest commented Oct 15, 2023

The weight for a benchmark = 1 / (factor for 90% percentile of the benchmark)
For the run above those factors are:

01_run1k: fastest 38.723 90% percentile 60.240900000000025 factor 1.5556878341037632
02_replace1k: fastest 39.151 90% percentile 69.82300000000004 factor 1.7834282649229913
03_update10th1k_x16: fastest 19.336 90% percentile 34.26060000000001 factor 1.771855606123294
04_select1k: fastest 3.285 90% percentile 17.059300000000007 factor 5.1930898021309
05_swap1k: fastest 22.607 90% percentile 171.2572 factor 7.575405847746274
06_remove-one-1k: fastest 17.721 90% percentile 33.581000000000046 factor 1.8949833530839144
07_create10k: fastest 386.767 90% percentile 685.2165 factor 1.771651924802271
08_create1k-after1k_x2: fastest 40.882 90% percentile 74.21810000000002 factor 1.8154224353016004
09_clear1k_x8: fastest 13.144 90% percentile 31.103900000000003 factor 2.3663953134510045

So this time it's dervied from the actual 90% percentile, but I'd fix those weights for the next few runs.

@leeoniya
Copy link
Contributor Author

i'm on board with whatever reduces the big ± variance of the baseline (and smooths out the factor progression of the fast runs for each metric). 👍

@krausest
Copy link
Owner

krausest commented Nov 5, 2023

Implemented since chrome 118.

@krausest krausest closed this as completed Nov 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants