-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
EES-5003 - Sitemap and robots not dynamically generated #4959
Changes from all commits
94b8130
8b80d86
f5ef876
a8cf102
79bd190
2a1f28c
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
PUBLIC_URL=https://explore-education-statistics.service.gov.uk/ | ||
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -5,7 +5,17 @@ module.exports = { | |
exclude: ['/server-sitemap.xml'], | ||
generateRobotsTxt: true, | ||
robotsTxtOptions: { | ||
policies: getRobotsRuleset(process.env.APP_ENV), | ||
policies: [ | ||
{ | ||
userAgent: '*', | ||
allow: '/', | ||
disallow: [ | ||
'/data-tables/fast-track/', | ||
'/data-tables/permalink/', | ||
'/subscriptions/', | ||
], | ||
}, | ||
], | ||
Comment on lines
+8
to
+18
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Following our discussion yesterday, had a little think and this will probably be fine as bots can't really crawl the non-prod environments because our basic auth will prevent general access. Does make me think that the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah exactly - I had the same thoughts when we put them in tbh but I guess it's a good regression in case we ever remove that auth 😄 |
||
additionalSitemaps: [`${process.env.PUBLIC_URL}server-sitemap.xml`], | ||
}, | ||
transform: async (config, path) => { | ||
|
@@ -51,27 +61,3 @@ function generateDefaultPathProperties(config, path) { | |
alternateRefs: config.alternateRefs ?? [], | ||
}; | ||
} | ||
|
||
function getRobotsRuleset(environment) { | ||
if (environment === 'Production' || environment === 'Local') { | ||
return [ | ||
{ | ||
userAgent: '*', | ||
allow: '/', | ||
disallow: [ | ||
'/data-tables/fast-track/', | ||
'/data-tables/permalink/', | ||
'/subscriptions/', | ||
], | ||
}, | ||
]; | ||
} | ||
|
||
// Disallow any and all trawling of staging, dev, or pre-prod sites | ||
return [ | ||
{ | ||
userAgent: '*', | ||
disallow: '/', | ||
}, | ||
]; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think we'll need to update the environment variables in the Playwright tests too?
The
PUBLIC_URL
variable will be be set to the actual environment URL, so imagining we'll need aPROD_PUBLIC_URL
variable or similar.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm yes good point, but actually I wonder if we should make these tests prod only?
So with these changes, if frontend is ran locally in dev mode we should still get a localhost robots file so the tests should pass as they are. If frontend is ran in prod mode then they'll fail.
If we make the tests always use the live site prod url, then at each stage of the pipeline we'll effectively be running these tests against prod. If this got merged and deployed then those tests failed for any reason, they'd fail on every environment which obviously is a bit misleading and hard to get a fix through the pipeline steps.
As a side note, in my local branch for improving how we test new redirects I've introduced tags to the playwright tests so I can filter which ones are ran. The intention being to replicate the way robot tests can be ran on selected environments. We could do something similar here?
Or, just skip these tests for now for the sake of getting a fix out, and do it in another PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Making them prod only might be a good idea. I'm just a little concerned about spending too much time improving the Playwright tests given we're not committing to them for the time being.
Imagine we'll need to have some more conversations about the future of them soon, but my preference would be to have any important suites moved back into the Robot tests to keep everything cohesive.
Also, just realised we're not actually running the Playwright or Robot tests for non-dev environments in the pipeline anymore. This makes me think that adding tagging functionality probably is probably overkill.
Regardless, we might not be on the same page about the
PROD_PUBLIC_URL
variable approach. I was thinking of having this alongsidePUBLIC_URL
(it wouldn't be a replacement) and would be used for therobots.txt
assertions like:This way, the tests are running against their respective environment (and not only prod).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrm yeah fair enough, very happy to do that
(& agree on not spending more time on playwright - I'm only modifying existing tests / reducing the faff involved in running them, certainly not intending to add more. Certainly not doing anything that would take me longer than 15 mins or so)