-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perf Testing: Reloading vs creating fresh page for subsequent samples #51379
Comments
cc @oandregal |
I think we should avoid changing the logic of the performance jobs while migrating to playwright. Changing logic is tricky and we should do it in their own PRs that way we can see the impact on the graphs post merge. |
Testing against isolated pages is critical for getting usable performance metrics. I would caution against trying to minimize the metrics reported by the tests, because that's not the purpose of the tests. We should optimize for reliability and consistency in the measurements, because that's about the only way we can derive value from them, which is tracking impacts on editor performance over time. |
Agreed, I'll aim to stay as 1:1 as possible. We need to keep in mind, though, that the migration itself will likely alter the numbers due to, e.g. Playwright's auto-waiting mechanism. I'll be taking the measurements before- and after migration and putting them in #51084. |
Agreed. To be clear, as I haven't mentioned in the PR description above - reloading is what we're currently doing with Puppeteer, and using fresh pages is what I suggest doing instead with Playwright. As per #51379 (comment), for the migration itself I'll migrate with the reloading logic as it was and leave the fresh page refactor to a follow-up PR. |
Thanks for clarifying @WunderBart - the only part I really mean to point to is what we did in #47889 - that shouldn't be lowered to a page reload, since page reloads were what introduced bias in the performance measurements. If we're talking about other tests that aren't doing that, that's surprising I guess, but maybe we can move that direction in those too. We should be able to identify these tests by examining their variance and bias across different rounds of testing. For example, if we run the tests 300 times and the variance is still above the difference in a similar way to running the test 3 times, then that's the bit to pay attention to. |
I guess what we missed there was loop-based sampling inside a single test, e.g., in the theme perf tests. What happens there is the new page being created in the |
I think for our numbers (most of them) we use the chrome trace so this shouldn't be impacted by the auto-waiting or is it? If it is I'd say we should find a way to disable the auto-waiting or don't migrate because this might give us numbers that don't correspond to what we want to test. |
Ah, I don't really know yet, sorry for the fuss. I will see once I make the comparison (I'm currently struggling with a weird CI issue 😩) |
ℹ️ This is an observation I've recently had when migrating performance specs to Playwright.
Measuring LCP (Largest Contentful Paint) after reloading the current page vs. after creating a new page in a separate context:
Page reloading gives faster times, which suggests there's some caching going on. Testing against an isolated page shows more consistent numbers from the very first sample and is closer to what a real user would experience. I will be addressing this in #51084.
/cc @youknowriad @dmsnell
The text was updated successfully, but these errors were encountered: