Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EES-5003 - Sitemap and robots not dynamically generated #4959

Merged

Conversation

sambiramairelogic
Copy link
Contributor

Next Sitemap is generating the robots.txt and sitemap.xml files at build time, and not re-generating them dynamically on each page load.

Because we only build the frontend once - without any environment information - we cannot distinguish at build time what the value of PUBLIC_URL should be.

There are a few approaches to how we could tackle this, but the fastest and simplest right now is to create a .env.production config file which will be prioritized over .env during next build.

Also, I missed some require changes when converting from sitemap-summaries to sitemap-items so this adds those in too 🙈

@sambiramairelogic sambiramairelogic requested a review from ntsim June 14, 2024 07:47
@@ -0,0 +1 @@
PUBLIC_URL=https://explore-education-statistics.service.gov.uk/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we'll need to update the environment variables in the Playwright tests too?

The PUBLIC_URL variable will be be set to the actual environment URL, so imagining we'll need a PROD_PUBLIC_URL variable or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm yes good point, but actually I wonder if we should make these tests prod only?

So with these changes, if frontend is ran locally in dev mode we should still get a localhost robots file so the tests should pass as they are. If frontend is ran in prod mode then they'll fail.

If we make the tests always use the live site prod url, then at each stage of the pipeline we'll effectively be running these tests against prod. If this got merged and deployed then those tests failed for any reason, they'd fail on every environment which obviously is a bit misleading and hard to get a fix through the pipeline steps.

As a side note, in my local branch for improving how we test new redirects I've introduced tags to the playwright tests so I can filter which ones are ran. The intention being to replicate the way robot tests can be ran on selected environments. We could do something similar here?
Or, just skip these tests for now for the sake of getting a fix out, and do it in another PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making them prod only might be a good idea. I'm just a little concerned about spending too much time improving the Playwright tests given we're not committing to them for the time being.

Imagine we'll need to have some more conversations about the future of them soon, but my preference would be to have any important suites moved back into the Robot tests to keep everything cohesive.

Also, just realised we're not actually running the Playwright or Robot tests for non-dev environments in the pipeline anymore. This makes me think that adding tagging functionality probably is probably overkill.

Regardless, we might not be on the same page about the PROD_PUBLIC_URL variable approach. I was thinking of having this alongside PUBLIC_URL (it wouldn't be a replacement) and would be used for the robots.txt assertions like:

test('The robots.txt file points to the sitemap', async ({ page }) => {
  await page.goto(`${PUBLIC_URL}/robots.txt`);
  await expect(page).toHaveURL(`${PUBLIC_URL}/robots.txt`);

  await expect(
    page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/sitemap.xml`),
  ).toHaveCount(1);

  await expect(
    page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/server-sitemap.xml`),
  ).toHaveCount(1);
});

This way, the tests are running against their respective environment (and not only prod).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm yeah fair enough, very happy to do that
(& agree on not spending more time on playwright - I'm only modifying existing tests / reducing the faff involved in running them, certainly not intending to add more. Certainly not doing anything that would take me longer than 15 mins or so)

Comment on lines +8 to +18
policies: [
{
userAgent: '*',
allow: '/',
disallow: [
'/data-tables/fast-track/',
'/data-tables/permalink/',
'/subscriptions/',
],
},
],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our discussion yesterday, had a little think and this will probably be fine as bots can't really crawl the non-prod environments because our basic auth will prevent general access.

Does make me think that the meta tags we added were probably overkill as well 🤔 guess they don't hurt to have anyway!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly - I had the same thoughts when we put them in tbh but I guess it's a good regression in case we ever remove that auth 😄

@sambiramairelogic sambiramairelogic force-pushed the EES-5003/sitemap-and-robots-not-dynamically-generated branch from 88a5fb4 to 2a1f28c Compare June 14, 2024 12:07
@sambiramairelogic sambiramairelogic merged commit b04b24e into dev Jun 14, 2024
2 of 7 checks passed
@sambiramairelogic sambiramairelogic deleted the EES-5003/sitemap-and-robots-not-dynamically-generated branch June 14, 2024 12:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants