-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add command to benchmark Web Vitals via web-vitals
library
#40
Conversation
…ady documented anyway.
* In the future, additional Web Vitals like CLS, FID, and INP should be | ||
* added, however they are slightly more complex to retrieve through an | ||
* automated headless browser test. | ||
* See https://github.com/GoogleChromeLabs/wpp-research/pull/41. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FID and INP are more complex as require an interaction, but the load CLS should be similar to FCP/LCP/TTFB shouldn't it?
Granted, that often isn't the full CLS of a page (as it can be impacted by scroll, or other interactions), but it's still usually a good part of CLS. And that "load CLS" is what Lighthouse and PSI already captures despite it's limitations.
The only thing is that, kind of like LCP, CLS is not emitted by web-vitals.js, until the page is hidden (including navigating away from it). For the web-vitals.js library we dummy this visibility change:
https://github.com/GoogleChrome/web-vitals/blob/main/test/utils/stubVisibilityChange.js
And then call it like this:
// Wait until all images are loaded and rendered, then change to hidden.
await imagesPainted();
await stubVisibilityChange('hidden');
Maybe that's enough to make it too complicated here?
The alternative is to use the reportAllChanges
flag and only report at the end of the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @tunetheweb, that it great context. I am a bit unsure about CLS being more similar to the load time metrics; based on what you're saying it makes sense, but when I initially tried using all of them, I always got data for FID actually but not CLS.
In any case, I think due to those additional complexities it makes sense to separate the work into more incremental steps, and I think starting with only FCP/LCP/TTFB works well as those are all load time metrics and they are simplest to implement here. #41 covers the follow up work, so I'd prefer to handle CLS, FID, and INP in that (or maybe even broken down into further separated PRs).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah sounds like it’s due to not changing the visibility so CLS not being “finalised”. Which I realised and edited after writing that first sentence (but before submitting the review).
@oandregal Would you mind giving this PR a review? Have you been able to test the command yet? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested the command with all the options, and it is working great. Approving PR now!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested all commands as listed in the branches README, and all are functioning correctly for me. Approved from my end.
afa6239
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @felixarntz, Quick review and approve for merge.
Sorry for the late response. I had a use case in mind and only now was able to create a test case. WordPress/gutenberg#48728 (comment) This command is very useful, thanks! Some feedback: LCP-TTFB. I miss having data for the LCP-TTFB median based on the of each result instead of subtracting LCP and TTFB medians, as I had to do in the test case above. Quality check. This command has shown better data quality than the other approaches I've used. Nice! I saw you are adding percentiles at #43 and that will be useful! I am still unsure about how best to address the fact that performance analysis contains high variability. For example, using 10 runs, I've seen:
The data has more variability than I'd like to. Finding ways to reduce it would be nice. Otherwise, executing many runs (hundreds) is still my preferred mechanism. Access to the full dataset. If I have access to the full dataset, I can do analysis after the fact. |
Thanks @oandregal for the feedback! Regarding LCP - TTFB, that's a great idea to add as a metric, I'll take a note to do that (though I'm AFK for the next few days, so it'll take a couple of days). Feel free to open a PR if you have capacity, but otherwise I'll get to it once I'm back. For the variability, are those ms differences you're sharing between different runs of the same command? For example, one time you ran it with 10 runs, you got p25 set to some value, and the other time the p25 was 15ms different? I'm definitely interested in exploring this further to see how we can get those numbers down, but partially of course it will be impossible due to varying machine resources etc. For example I'd be curious how the results differed for the other percentiles in those specific runs you did. Maybe we need to conduct a more detailed analysis on how the results differ between different runs with the tool. Would be great if we could use that to find the "sweet spot" for how many runs to do per each invocation of the command. |
This introduces a new
benchmark-web-vitals
command that, similar to the existingbenchmark-url
command allows making 1 or more requests to 1 or more URLs, capture the results and compute the medians for each metric. However, the notable difference is that this one measures Core Web Vitals and other key client metrics, using https://github.com/GoogleChrome/web-vitals.As part of this PR, the
benchmark-url
command is renamed to a more appropriatebenchmark-server-timing
, to clearly differentiate the two commands and clarify the different purposes. They also function quite differently, as the existing command simply makes acurl
request to the URL while the new one here uses a headless browser.The initial iteration implemented in this PR only includes load time metrics (FCP, LCP, and TTFB) as those are the most critical for us to measure, and it appears that they are easier to implement. An early draft PR for adding other Web Vitals CLS, FID, and INP exists at #41 and can be explored afterwards as an iteration.
Testing
You can test this command by following the instructions included in this PR (see https://github.com/GoogleChromeLabs/wpp-research/tree/add/benchmark-web-vitals/cli#benchmark-web-vitals), which also provide examples.