-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
E2E perf tests: run each test in a separate page #47889
E2E perf tests: run each test in a separate page #47889
Conversation
Size Change: +449 B (0%) Total Size: 1.33 MB
ℹ️ View Unchanged
|
5b23e0e
to
fbf3509
Compare
Flaky tests detected in b863d42. 🔍 Workflow run URL: https://github.com/WordPress/gutenberg/actions/runs/4145852349
|
There seems to be a bias towards the first loading measurement being larger than the rest (both in the post editor and site editor tests), likely due to a caching quirk of some sort. I pushed a new commit that adds a configurable number of "throwaway" measurements before the recorded measurements, and set it at 1. This should help ensure that all of the samples face similar caching characteristics, thus reducing variability in the results. |
4d66fa9
to
addba6e
Compare
PHP tests currently seem to be broken in trunk, will rebase again when that's no longer the case. |
addba6e
to
0ef78ce
Compare
Rebased again. |
Note that this PR significantly reduces variability in the loading tests in local dev environments, but CI still has high levels of variability. CI variability will need to be investigated and fixed separately. This PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some small comments but overall, this looks good. Thanks for taking the time to work on perf metrics stability
Thank you for the review, @youknowriad ! 👍 Things don't seem to be working quite as expected after the review changes. I might need to revert them. Investigating. |
This reverts commit 2b092e2.
I had to revert your review suggestions for In any case, the lingering page will be closed at the end of the tests, when the browser is closed, so there should be no concern there. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would have been nice to add a comment why these started failing when put in beforeEach
so as to prevent someone from coming in and "fixing it" later on.
I can propose a follow-up PR to do just that. on that point, do you have a terse description of what "things" broke or how when you say it "starts breaking things"?
// the results. | ||
const throwaway = 1; | ||
|
||
let i = throwaway + samples; | ||
while ( i-- ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if only we had a construct that let us assign an initial counter value and specify how to bump it on each iteration 😆
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose I could have refactored the code further as I was changing it 🙂
@sgomes @dmsnell just as a heads up, I've been trying to address the current site editor perf tests issues in the following PRs:
I'm still pretty lost as I cannot reproduce anything locally. Since I added the artifact upload, though, we at least know that the editor canvas loads consistently empty (in various forms, which also blows my mind - see #48208 (comment) and #48138 (comment)). Let me know if it rings any bells. It's really difficult to debug if it happens only in CI 😩 |
an initial attempt to revert shows the same test failures present in I think I performed the revert properly |
@dmsnell: It's difficult to pin it down to one specific thing; I would have fixed the issue otherwise. In general, it appears that different tests make different assumptions regarding the state that the page is in at the various jest lifecycle stages ( I don't think the above would have been much use as a code comment, which is why I didn't add one. |
For completeness, the above revert has been dropped since this PR was not to blame. |
Heads up: The real reason for the current perf tests failures https://github.com/WordPress/gutenberg/pull/48094/files#r1112904171 |
Thank you, @WunderBart 👍 |
What?
Run each performance test in a separate Puppeteer page.
Why?
The loading performance tests are currently unreliable, producing values up to two orders of magnitude apart between individual test runs. After some investigation, I determined that this appears to be due to the same page being reused for all tests, with the unloading work from the previous test bleeding into the measurements on the following test.
How?
This PR applies three separate changes to ensure that each performance test runs on its own page:
setup-performance-test.js
to create and initialise new pages between testsTo further reduce the variability of results, this PR also adds a configurable number of "throwaway" measurements before the recorded measurements, and sets it at 1. This eliminates the apparent bias in the first result being larger than the rest, likely due to some sort of caching quirk.
With these changes, all the individual runs for the loading tests should now be in the same order of magnitude.
Testing Instructions
post-editor.test.results
and observe that the loading metrics have unusable variabilitypost-editor.test.results
and observe that the loading metrics have substantially reduced variabilityTesting Instructions for Keyboard
N/A, this change only affects tests
Screenshots or screencast
N/A
CC @dmsnell, who has been looking at performance test reliability
CC @youknowriad, who was also involved in the discussions