-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discrepancy between Chrome's performance measurements and benchmark tests #826
Comments
@leeoniya that does speed up the times by a bit: But I am still seeing Svelte as roughly twice as slow on Chrome. It hadn't occurred to me to manually test in Chromium, so I did that, and here are roughly representative samples of what I'm getting (with samples disabled): Svelte on Chromium: My framework on Chromium: As you can see there's less of a difference, but it still shows Svelte as slower, even excluding the "Loading" step, and I'm not sure what that's about. Back to chrome, I'm trying it with devtools shut and a stop watch (so scientific I know) and mine is going in the 3.40 range with Svelte in the 4.40 range, which despite the inaccuracy of the method is a valid difference. So either me, Chrome, Svelte or the benchmark is doing something silly (let's assume me). |
@krausest thanks for the response, and thank you for making such an awesome tool - I have found it really, really useful. I accept that there will be a difference between the benchmark and devtools. But I would expect that percentage to be the consistent between frameworks, e.g. 15% for all, but my observation (if correct) shows this is not the case. If I add the times from my Chromium screenshots above and compare those to the benchmark, we see the percentages are different:
Diff1 is if I add the Scripting + Rendering + Loading from the pie chart. I don't know what "Loading" refers to, or why it is 0ms for my framework but 137ms for Svelte, so I included the percentage without it. But even with "Loading" taken out, the percentage is very different: 2% vs 15%. Unfortunately I am struggling to figure out how to measure the exact time between two events in devtools as you described. But am I understanding correctly that the two points you are measuring between represent a subset, or a slice, of the total which I would get if I manually adding up the coloured segments from the pie chart summary as I did above? If so, this is a problem because the latter method of measuring seems to me to be far more indicative of the time it these operations really take, i.e. as the user would experience them in the browser. And if the benchmark is going to measure just a slice of that, then the slice should be the same percentage (within a pretty narrow margin) for each framework. Otherwise the benchmark results, and the comparison drawn from them would be invalid. I measured that difference as 2.2% for one framework, and 21.7% for another, which is clearly very big. But before I go to repeat that to improve the accuracy (because it really isn't very accurate) and compare it to the difference for other frameworks, I would be grateful if you could confirm whether my logic and assumptions are sound, because I'm not 100% on it. |
OK, we're getting somewhere. I figured out how to measure from devetools with shift+mouse :-) I took the benchmark readings straight out of the results json files:
Hopefully the two means are self explanatory, and the "bench says" is what I read from the Interactive Results table: I am not sure how that is calculated. If we look at the geometric mean as I calculate it (google sheet GEOMEAN formula) we see the benchmark lists RedRunner as 4.85% slower than it really* is, and Svelte as a whopping 18.4% faster! * assuming dev tools measurements match browser reality more closely than the benchmark measurement, which I think is the case, though my empirical evidence for that in previous comments is void due to the gzip issue. Here's the spreadhsheet if you want. I'll submit a PR with RedRunner and you can perhaps tell me if you see the same. |
I can confirm there's something strange going on. Manual testing from dev tools shows that redrunner is faster than svelte
But the benchmark driver claims that svelte is faster
It's important to understand that the benchmark driver acutally uses the chrome dev tools events (so it's not two measurements, just one without a GUI and with different options like traceCategories), but obviously there's a difference somewhere. I'll try to find out where the difference comes from. |
Looks like it's due to #549 When I run svelte with the "categories -*, disabled-by-default-lighthouse, loading, v8, v8.execute, blink.user_timing, blink.console, devtools.timeline, disabled-by-default-devtools.timeline, disabled-by-default-devtools.screenshot, disabled-by-default-devtools.timeline.stack" svelte is reported to be much slower and much closer to the manual testing in the devtools:
So I suspect that chrome dev tools use perfLoggingPrefs similar to the lighthouse.traceCategories. |
Seems like the traceCategory disabled-by-default-devtools.timeline.stack causes the difference (and I think disabled-by-default-devtools.timeline adds a few msecs too). |
Thanks for looking into this so thoroughly. I am not in a position to challenge the statement on Svelte vs RedRunner, nor is that my goal. And unfortunately I am not in a great position to assist with or comment on Chrome or lighthouse configuration, or how to measure accurately. Of course I'm less concerned about discrepancies between benchmark and dev tools, than I am with whether the benchmarks are representative of real world performance. I think we need to find a way to calibrate benchmark recorded times against actual browser times (so, without devtools profiling overhead) and check that the inevitable gap is consistent across frameworks. And if it isn't, then we'd need to look into that. The only way I can think of to measure real time without devtools profiling overhead is to take a sufficiently long running operation (e.g. 10,000 rows at 16x slowdown) and measure it with an external timer of sorts. Just as an occasional one-off to calibrate. I'll probably be called a caveman for suggesting a stopwatch, but you can at least get a feel for whether framework A is faster than B, and if the benchmark says the opposite, then you at least know there's a problem. With regards to this:
I assumed that is how we calculate the time in brackets (1.50), (1.67) but surely that doesn't apply to the times in ms? Here are the means vs what's in the table:
Surely the ms in the table should use the mean or geometric mean of the ten results, or something not far off? The geometric mean of ten results has 1ms difference between the two frameworks, yet the table places 85ms between them... I don't see how it gets to that... Thank you for your continued patience and answers :-) |
That startMeasure / stopMeasure was my caveman's tool. It allowed me to validate (somewhat) the benchmark results, but there were some voices that it might not always yield valid results. Here's an example for computing the geometric average: |
Actually I made a slight mistake, I pulled the Svelte numbers from the json file for a previous version. I've copied the v0.3.29 into the spreadsheet and it shows a -23.42% difference, not -18.4%. The means of those correct results are lower that RedRunner's, so at least the numbers in the table make sense now. But based on that last reply, would it be fair to say that the table does not in fact display "Duration in milliseconds" as stated. As I understand it, only the fastest framework's times would display the accurate time in ms, and the rest are extrapolations? |
There are two more secret to the results table: The result table displays either the median or mean of the observed samples, depending on what you select in display mode. Both display modes are measured in milliseconds. In each cell the value in the parentheses is a factor how much slower this value is in comparison to the fastest. |
I understand the table a bit better now, thanks. But just to recap, because I created quite a few tangents with side questions:
Where "devtools" means "devtools as run from the browser". Is that your understanding too? |
Yes, mostly. I'd rather reduce #2-#4 to the question whether create 10,000 rows in chrome for svelte takes 1,3 secs or 1,8 secs. I personally believe the answer startMeasure und stopMeasure gives me on this question, but I tried something different: |
The start/stopMeasure hack also supports the claim that the dev tools are slowing down svelte. |
I am not really able to do the camera trick accurately with my phone (good idea though). But using the start/stop console log is very interesting. Here is me adding 10K lines on redrunner, alternating between devtools open and devtools closed on Chrome (if it starts with 4, devtools were open): Here is me doing the same thing on Chromium: This doesn't help establish whether benchmarks are representative of speeds, but does show that merely opening or closing Chrome's devtools can cause a massive slowdown (that's just the console, no performance recording or anything) Chromium doesn't seem to do this. Maybe it's just my Chrome (can't think of any plugins that would affect it). Quick question on a separate note, can we/should we gitignore webdriver-ts/results.json? |
webdriver-ts/results.json might be useful to extract historic results if (and only if) it's just me committing this file. |
@andyhasit Thanks for reporting this issue. I'm closing it now. In retrospective I think the implementation is fine, but it's always good to challenge its validity. |
I'm comparing my framework (not ready, not submitted) against Svelte and getting very different stories from Chrome's devtools (just the record button) compared to the benchmark tests.
For example, running "Add 10,000 rows" I get:
I've run all of these dozens of times now.
I am not concerned about the difference between Svelte and my framework, or between Chrome+devtools and Chromium. What concerns me is how it is possible for devtools to clock Svelte as being twice as slow as my framework, when the benchmark clearly puts it ahead.
Surely devtools is not that unreliable?
Screenshot from local results table:
Screenshot of my framework add 10,000 rows:
Screenshot of Svelte add 10,000 rows:
The text was updated successfully, but these errors were encountered: