Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EES-5003 - Sitemap and robots not dynamically generated #4959

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions azure-pipelines-ui-tests.dfe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -264,6 +264,7 @@ jobs:
env:
CI: 'true'
PUBLIC_URL: $(PUBLIC_URL)
PROD_PUBLIC_URL: $(PROD_PUBLIC_URL)
PUBLIC_USERNAME: $(PUBLIC_AUTH_USER)
PUBLIC_PASSWORD: $(PUBLIC_AUTH_PASSWORD)
ADMIN_URL: $(ADMIN_URL)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ export interface DataSetSitemapItem {

const dataSetService = {
listSitemapItems(): Promise<DataSetSitemapItem[]> {
return contentApi.get('/data-set-files/sitemap-summaries');
return contentApi.get('/data-set-files/sitemap-items');
},
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ const methodologyService = {
return contentApi.get(`/methodologies/${methodologySlug}`);
},
listSitemapItems(): Promise<MethodologySitemapItem[]> {
return contentApi.get('/methodologies/sitemap-summaries');
return contentApi.get('/methodologies/sitemap-items');
},
};

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,7 @@ const publicationService = {
});
},
listSitemapItems(): Promise<PublicationSitemapItem[]> {
return contentApi.get('/publications/sitemap-summaries');
return contentApi.get('/publications/sitemap-items');
},
};
export default publicationService;
1 change: 1 addition & 0 deletions src/explore-education-statistics-frontend/.env.production
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
PUBLIC_URL=https://explore-education-statistics.service.gov.uk/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think we'll need to update the environment variables in the Playwright tests too?

The PUBLIC_URL variable will be be set to the actual environment URL, so imagining we'll need a PROD_PUBLIC_URL variable or similar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm yes good point, but actually I wonder if we should make these tests prod only?

So with these changes, if frontend is ran locally in dev mode we should still get a localhost robots file so the tests should pass as they are. If frontend is ran in prod mode then they'll fail.

If we make the tests always use the live site prod url, then at each stage of the pipeline we'll effectively be running these tests against prod. If this got merged and deployed then those tests failed for any reason, they'd fail on every environment which obviously is a bit misleading and hard to get a fix through the pipeline steps.

As a side note, in my local branch for improving how we test new redirects I've introduced tags to the playwright tests so I can filter which ones are ran. The intention being to replicate the way robot tests can be ran on selected environments. We could do something similar here?
Or, just skip these tests for now for the sake of getting a fix out, and do it in another PR?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Making them prod only might be a good idea. I'm just a little concerned about spending too much time improving the Playwright tests given we're not committing to them for the time being.

Imagine we'll need to have some more conversations about the future of them soon, but my preference would be to have any important suites moved back into the Robot tests to keep everything cohesive.

Also, just realised we're not actually running the Playwright or Robot tests for non-dev environments in the pipeline anymore. This makes me think that adding tagging functionality probably is probably overkill.

Regardless, we might not be on the same page about the PROD_PUBLIC_URL variable approach. I was thinking of having this alongside PUBLIC_URL (it wouldn't be a replacement) and would be used for the robots.txt assertions like:

test('The robots.txt file points to the sitemap', async ({ page }) => {
  await page.goto(`${PUBLIC_URL}/robots.txt`);
  await expect(page).toHaveURL(`${PUBLIC_URL}/robots.txt`);

  await expect(
    page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/sitemap.xml`),
  ).toHaveCount(1);

  await expect(
    page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/server-sitemap.xml`),
  ).toHaveCount(1);
});

This way, the tests are running against their respective environment (and not only prod).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrm yeah fair enough, very happy to do that
(& agree on not spending more time on playwright - I'm only modifying existing tests / reducing the faff involved in running them, certainly not intending to add more. Certainly not doing anything that would take me longer than 15 mins or so)

36 changes: 11 additions & 25 deletions src/explore-education-statistics-frontend/next-sitemap.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,17 @@ module.exports = {
exclude: ['/server-sitemap.xml'],
generateRobotsTxt: true,
robotsTxtOptions: {
policies: getRobotsRuleset(process.env.APP_ENV),
policies: [
{
userAgent: '*',
allow: '/',
disallow: [
'/data-tables/fast-track/',
'/data-tables/permalink/',
'/subscriptions/',
],
},
],
Comment on lines +8 to +18
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following our discussion yesterday, had a little think and this will probably be fine as bots can't really crawl the non-prod environments because our basic auth will prevent general access.

Does make me think that the meta tags we added were probably overkill as well 🤔 guess they don't hurt to have anyway!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah exactly - I had the same thoughts when we put them in tbh but I guess it's a good regression in case we ever remove that auth 😄

additionalSitemaps: [`${process.env.PUBLIC_URL}server-sitemap.xml`],
},
transform: async (config, path) => {
Expand Down Expand Up @@ -51,27 +61,3 @@ function generateDefaultPathProperties(config, path) {
alternateRefs: config.alternateRefs ?? [],
};
}

function getRobotsRuleset(environment) {
if (environment === 'Production' || environment === 'Local') {
return [
{
userAgent: '*',
allow: '/',
disallow: [
'/data-tables/fast-track/',
'/data-tables/permalink/',
'/subscriptions/',
],
},
];
}

// Disallow any and all trawling of staging, dev, or pre-prod sites
return [
{
userAgent: '*',
disallow: '/',
},
];
}
1 change: 1 addition & 0 deletions tests/playwright-tests/.env.example
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
PUBLIC_URL=http://localhost:3000 # URL of the public frontend
PROD_PUBLIC_URL=https://explore-education-statistics.service.gov.uk
PUBLIC_USERNAME=PUBLIC_USERNAME
PUBLIC_PASSWORD=PUBLIC_PASSWORD
ADMIN_URL=https://localhost:5021 # URL for the Admin frontend
Expand Down
8 changes: 4 additions & 4 deletions tests/playwright-tests/tests/public/seoMetadataFiles.spec.ts
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
import { test, expect } from '@playwright/test';
import environment from '@util/env';

const { PUBLIC_URL } = environment;
const { PUBLIC_URL, PROD_PUBLIC_URL } = environment;

test.describe('SEO Metadata Files', () => {
test('An xml sitemap index file can be found at the expected route', async ({
Expand All @@ -15,7 +15,7 @@ test.describe('SEO Metadata Files', () => {
).toHaveCount(1);

await expect(
page.getByText(`${PUBLIC_URL}/server-sitemap.xml`),
page.getByText(`${PROD_PUBLIC_URL}/server-sitemap.xml`),
).toHaveCount(1);
});

Expand Down Expand Up @@ -44,11 +44,11 @@ test.describe('SEO Metadata Files', () => {
await expect(page).toHaveURL(`${PUBLIC_URL}/robots.txt`);

await expect(
page.getByText('Sitemap: http://localhost:3000/sitemap.xml'),
page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/sitemap.xml`),
).toHaveCount(1);

await expect(
page.getByText('Sitemap: http://localhost:3000/server-sitemap.xml'),
page.getByText(`Sitemap: ${PROD_PUBLIC_URL}/server-sitemap.xml`),
).toHaveCount(1);
});

Expand Down
1 change: 1 addition & 0 deletions tests/playwright-tests/utils/env.ts
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@ dotenv.config();

const environment = {
PUBLIC_URL: process.env.PUBLIC_URL ?? '',
PROD_PUBLIC_URL: process.env.PROD_PUBLIC_URL ?? '',
PUBLIC_USERNAME: process.env.PUBLIC_USERNAME ?? '',
PUBLIC_PASSWORD: process.env.PUBLIC_PASSWORD ?? '',
ADMIN_URL: process.env.ADMIN_URL ?? '',
Expand Down