-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EES-5003 - Sitemap and robots not dynamically generated #4959
EES-5003 - Sitemap and robots not dynamically generated #4959
Conversation
@@ -0,0 +1 @@ | |||
PUBLIC_URL=https://explore-education-statistics.service.gov.uk/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we'll need to update the environment variables in the Playwright tests too?
The PUBLIC_URL
variable will be be set to the actual environment URL, so imagining we'll need a PROD_PUBLIC_URL
variable or similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm yes good point, but actually I wonder if we should make these tests prod only?
So with these changes, if frontend is ran locally in dev mode we should still get a localhost robots file so the tests should pass as they are. If frontend is ran in prod mode then they'll fail.
If we make the tests always use the live site prod url, then at each stage of the pipeline we'll effectively be running these tests against prod. If this got merged and deployed then those tests failed for any reason, they'd fail on every environment which obviously is a bit misleading and hard to get a fix through the pipeline steps.
As a side note, in my local branch for improving how we test new redirects I've introduced tags to the playwright tests so I can filter which ones are ran. The intention being to replicate the way robot tests can be ran on selected environments. We could do something similar here?
Or, just skip these tests for now for the sake of getting a fix out, and do it in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making them prod only might be a good idea. I'm just a little concerned about spending too much time improving the Playwright tests given we're not committing to them for the time being.
Imagine we'll need to have some more conversations about the future of them soon, but my preference would be to have any important suites moved back into the Robot tests to keep everything cohesive.
Also, just realised we're not actually running the Playwright or Robot tests for non-dev environments in the pipeline anymore. This makes me think that adding tagging functionality probably is probably overkill.
Regardless, we might not be on the same page about the PROD_PUBLIC_URL
variable approach. I was thinking of having this alongside PUBLIC_URL
(it wouldn't be a replacement) and would be used for the robots.txt
assertions like:
test('The robots.txt file points to the sitemap', async ({ page }) => {
await page.goto(`${PUBLIC_URL}/robots.txt`);
await expect(page).toHaveURL(`${PUBLIC_URL}/robots.txt`);
await expect(
page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/sitemap.xml`),
).toHaveCount(1);
await expect(
page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/server-sitemap.xml`),
).toHaveCount(1);
});
This way, the tests are running against their respective environment (and not only prod).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm yeah fair enough, very happy to do that
(& agree on not spending more time on playwright - I'm only modifying existing tests / reducing the faff involved in running them, certainly not intending to add more. Certainly not doing anything that would take me longer than 15 mins or so)
policies: [ | ||
{ | ||
userAgent: '*', | ||
allow: '/', | ||
disallow: [ | ||
'/data-tables/fast-track/', | ||
'/data-tables/permalink/', | ||
'/subscriptions/', | ||
], | ||
}, | ||
], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Following our discussion yesterday, had a little think and this will probably be fine as bots can't really crawl the non-prod environments because our basic auth will prevent general access.
Does make me think that the meta
tags we added were probably overkill as well 🤔 guess they don't hurt to have anyway!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah exactly - I had the same thoughts when we put them in tbh but I guess it's a good regression in case we ever remove that auth 😄
88a5fb4
to
2a1f28c
Compare
Next Sitemap is generating the
robots.txt
andsitemap.xml
files at build time, and not re-generating them dynamically on each page load.Because we only build the frontend once - without any environment information - we cannot distinguish at build time what the value of
PUBLIC_URL
should be.There are a few approaches to how we could tackle this, but the fastest and simplest right now is to create a
.env.production
config file which will be prioritized over.env
duringnext build
.Also, I missed some require changes when converting from
sitemap-summaries
tositemap-items
so this adds those in too 🙈