-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add visual regression tests for the Navigate Regions focus style #46210
base: trunk
Are you sure you want to change the base?
Conversation
Size Change: -4.16 kB (0%) Total Size: 1.32 MB
ℹ️ View Unchanged
|
Thanks for experimenting with it! I'd like to ping @tellthemachines as well as she has worked on similar things in Core. The reason why it's failing on CI is likely because we're running the tests in Linux on CI. This is one of the most significant pain points of visual regression testing: fonts behave slightly differently on different OSes. This means that if we want to get a consistent result for snapshotting/screenshotting, we will need to run the tests in the same environment as on the CI. Playwright has a built-in way to run the tests in the docker: docker run --rm --network host -v $(pwd):/work/ -w /work/ -it mcr.microsoft.com/playwright:v1.28.0-focal /bin/bash
npm install
npx playwright test --update-snapshots We can integrate that into a npm script and run it when updating snapshot, but that also normally means we'd need to have Docker as a hard requirement for e2e testing. This probably isn't that bad as we already require Docker in Perhaps we could only switch to using docker when we pass the As per the main motivation of this PR: the focus style regression, I think we can first try other explicit assertions like |
@kevin940726 thanks for the insights! Interesting technical challenge.
Specifically to this test case, I'm not sure Regarding the general implementation, there's one more thing to consider. For example, this is one of the reference images used in this PR's. It is used to test the blue outline on the top toolbar: Even if we manage to solve the font rendering issue, we would be exposed to failures for any change in the content of the top bar. I experimented the mask option but unfortunately it overlays also the outline we need to test. I also noticed that expect(page).toHaveScreenshot() has also a I wish there was a more flexible way to hide elements that we don't need to test for. Any thoughts? For reference, the Playwright docs about Visual comparisons is available at https://playwright.dev/docs/test-snapshots |
It would also be nice to be able to access the 'expected', 'received', and 'diff' images generated in case of failures at Update: figured out how to download the test run artifacts. |
In the last commit, I just want to see if the tests on Ubuntu pass by:
Example of reference image with masked elements: Example of reference image with all text made transparent: |
Turns out the font difference between operating systems also affects some buttons size. Actually, Gutenberg and WP use 'system fonts' that is they use a font stack Expected (reference screenshot taken on macOS ) vs. actual (screenshot taken on Ubuntu): |
Well, we technically have full control over the page so we can always do things like
Yep, exactly. We use different system fonts in different OSes as an optimization to prevent font-swapping and unnecessary network requests. Fonts don't only affect text either, they often affect layout and can be really hard to tackle. I'm not sure if there's an easy way around that. If we want to create a "stable" visual regression system then I guess the best solution for now is to enforce a consistent environment (Linux and Docker). Services like Percy and Chromatic move visual testing to the cloud so that they can always have a consistent testing environment. Perhaps we can try something similar? The nature of Open Source does make it more complicated though. |
Yep, I know. I was there when system fonts were introduced in core. I'm old :) Inspired but this article.Updating the snapshots in CI with a pull-request comment I was thinking at a maybe not that orthodox idea:
Any thoughts? |
This is very interesting, although might also be very complicated, as you suggested 😅. I'm not concerned by the complexity of the GH Action, but mostly by the development flow for contributors.
There are just some initial questions I have in mind. I expect there to be more. I think this is still worth pursuing though, just that it's not trivial and might take a while to mature. Hence why I didn't introduce visual testing when migrating to Playwright in the first place 🙈 . |
Latest commit is a Proof Of Concept that adds a small utility. It's similar to the |
TIL: turns out one of the reasons why the test is currently failing is that
That is: the mouse pointer stays on the Settings button. This sometimes makes the Settings button tooltip show even when focus is elsewhere, because the button is hovered. Depends also on timings. The tooltip may then be captured in the 'actual' screenshot. Reproduced locally, see screenshot: It's something important to keep into consideration when building visual comparison tests. |
I'd agree this seems to be a good first step 🎉 I'm not actually thinking the pull request comment would be a good trigger for the workflow. |
The test now fails because of a difference of 2 pixels 😅
Barely noticeable: it's the antialiasing on the small horizontal line of the Document Overview panel highlighted in the screenshot below (expected vs. actual vs. diff): I think it's safe to set a threshold for such small differences. |
Note that
|
For reference, here's what has been done for visual regressions tests in WP Core (they only run locally): |
Related: #43393
What?
For now, visual regression testing is implemented only for a couple components in the Storybook and the tests only run locally. See #43393. However, there are several features in Gutenberg that would greatly benefit from visual regression tests.
Why?
The Navigate Regions focus style is one of them. It already broke and was fixed a few times, for example in #45369 and #8554
Worth noting the focus style has been broken for quite a long time as it's hard to notice a breakage if the feature isn't actually used or manually tested. Normal E2E tests can test for the interaction to work as expected. Instead, only visual regression tests can test the blue outline focus style is actually visible and prevent from breaking again in the future.
This PR is a proof of concept to consider what is the best way to introduce more visual regression tests in the project.
Any thoughts and feedback welcome.
/Cc @kevin940726, @mirka, @aristath
How?
toHaveScreenshot()
.test/e2e/specs/editor/various/__snapshots__
directory, for a total size of ~57 KB.Considerations:
test/e2e/specs/visual-regression
.test
directory is already excluded from the release build.artifacts/test-results
directory: 'expected', 'actual', and 'diff'.toHaveScreenshot()
assertion has a few optional parameters that may come in handy,Testing Instructions
npm run test:e2e:playwright -- editor/various/navigate-regions.spec.js
To see the browser in action:
npm run test:e2e:playwright -- --headed editor/various/navigate-regions.spec.js
Only if you need to update the reference images after edits to the test:
$ npm run test:e2e:playwright -- --headed --update-snapshots editor/various/navigate-regions.spec.js
Testing Instructions for Keyboard
Screenshots or screencast