Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(gatsby): trailing slash redirect in commands/serve #21203

Closed
wants to merge 2 commits into from
Closed

fix(gatsby): trailing slash redirect in commands/serve #21203

wants to merge 2 commits into from

Conversation

cosmn
Copy link

@cosmn cosmn commented Feb 4, 2020

Description

This is, in my opinion, a major issue at the core level of how gatsby generates and serves static files, usually blamed on Netlify and it's redirect settings.

My solution is a hack, and should be treated as such, until you'll find a better way to serve static files, another module , or a custom implementation, because you have a custom way of generating files.
I'm a first time Gatsby user, therefor I cannot asses the side-effects this patch may introduce, despite all tests passed successfully on my local machine. Your input here is highly required.

This PR is a continuation of an abandoned PR#19567 and my proposed solution matches, more or less, one of the approaches presented there by @pieh. I am basically deceiving serve-static into getting a trailing slash url, thus avoiding its redirect behavior.

Related Issues

Fixes #19543
Assuming it also Fixes #9207

@cosmn cosmn requested a review from a team as a code owner February 4, 2020 22:41
@cosmn
Copy link
Author

cosmn commented Feb 7, 2020

Ping @pieh and @blainekasten because of their involvement in the matter

@cosmn cosmn requested a review from pieh February 11, 2020 15:23
@blainekasten
Copy link
Contributor

So i've spent more time researching this. I'm more convinced now that Gatsby is actually doing the right thing. For reference read this article: https://searchfacts.com/url-trailing-slash/

In addition, did you know that we have plugins for adding and removing slashes? I wonder if those would solve your needs instead of changing the way the codebase works. gatsby-plugin-remove-trailing-slash

before static middleware and prevent redirect
@cosmn
Copy link
Author

cosmn commented Feb 12, 2020

@pvdz I've made another commit to this issue, hopefully more descriptive, taking your feedback into consideration. This is not far from @pieh's "200 redirect" proposal.

@blainekasten I appreciate your input, although I was not debating if url trailing slash is right or wrong. The issue here is an unexpected behavior, neither Gatsby or any other framework should perform redirects unless specifically instructed so. The author of Issue #19543 did a perfect job in replicating the behavior, which you confirmed with the following comment

I also confirm that plugins like gatsby-plugin-remove-trailing-slash do nothing to fix the issue.

@cosmn
Copy link
Author

cosmn commented Feb 12, 2020

A less intrusive solution would be to expose the express instance via gatsby-config, like you do with developMiddleware, this time for production.

Copy link
Contributor

@pvdz pvdz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This update was much better, thank you! But unfortunately I may have found a few actual concerns.

Comment on lines +53 to +56
// url is home page
if (req.url === `/`) return next()
// url has trailing slash, no need to intervene
if (req.url.slice(-1) === `/`) return next()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The second check subsumes the first because 'x'.slice(-1) === 'x' :)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, I wanted to differentiate the two cases, easier to understand.
Falls into "write code for humans" practice. The fact that you noticed, proves my point.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I get what you're saying but I still think I'd prefer to drop the redundant check and updating the comment to reflect the two cases being caught here.

// url has trailing slash, no need to intervene
if (req.url.slice(-1) === `/`) return next()
// url is a file request, no need for trailing slash
if (fileExtensionRegEx.test(req.url)) return next()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to overshoot a little bit. This is basically the .includes() check while trying to assert that there is a non-dot before and after the dot. I don't have enough context but is there a particular reason for this?

Also, I think the $ in the regex serves no purpose? (I don't think there's a difference with and without it). If you're set on keeping it, I think /.\../ is the same regex. Additionally, this assumes folder parts can not have a dot (that may be legit, depending on the source of urls, but is certainly not generic).

if (fileExtensionRegEx.test(req.url)) return next()

// turning /slug into /slug/
req.originalUrl += `/` // https://expressjs.com/en/api.html#req.originalUrl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first example in this link has:

// GET /search?q=something
console.dir(req.originalUrl)
// => '/search?q=something'

So potentially this line would be doing originalUrl = '/search?q=something' + '/'. That's probably not what you wanted?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, url's with query args is a use case I haven't tackled yet. I'm not fully familiar with gatsby's handle of query args. Your input here is required.
Do we turn /search?q=something into /search/?q=something ?

Copy link
Contributor

@pvdz pvdz Feb 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be honest, I have no idea :) I mainly jumped on the syntactical issues with this part of the change set.

Maybe @pieh knows

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this piece of code in production runtime that adjust url to use canonical page path for initial page load (so this will be connected with to gatsby serve behaviour in some way) -

// Explicitly call navigate if the canonical path (window.pagePath)
// is different to the browser path (window.location.pathname). But
// only if NONE of the following conditions hold:
//
// - The url matches a client side route (page.matchPath)
// - it's a 404 page
// - it's the offline plugin shell (/offline-plugin-app-shell-fallback/)
if (
pagePath &&
__BASE_PATH__ + pagePath !== browserLoc.pathname &&
!(
loader.findMatchPath(stripPrefix(browserLoc.pathname, __BASE_PATH__)) ||
pagePath === `/404.html` ||
pagePath.match(/^\/404\/?$/) ||
pagePath.match(/^\/offline-plugin-app-shell-fallback\/?$/)
)
) {
navigate(__BASE_PATH__ + pagePath + browserLoc.search + browserLoc.hash, {
replace: true,
})
}

Just note that "canonical path" on its own doesn't specify if it has trailing slash or not - it just mean path that was used with createPage action call (Gatsby doesn't normalize those right now - at least for pages created in user code in createPages hook). Gatsby does however use trailing slashes for pages created from src/pages directory (so it does have opinion here).

I'm not sure if this make any sense or if this is answer any question here.

But for manipulating URLs I recommend using builtin Node's url module - particularly:

  • url.parse - to created structured url object
  • then modify pathname part to add trailing slash
  • url.format - for converting object back to string


// turning /slug into /slug/
req.originalUrl += `/` // https://expressjs.com/en/api.html#req.originalUrl
req.url += `/`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which made me look into .url, so. Same link has

req.url is not a native Express property, it is inherited from Node’s http module.

https://nodejs.org/api/http.html#http_message_url

When request.url is '/status?name=ryan'

So it looks like .url has the same problem. Is there a reason why this is not a problem?

@wardpeet wardpeet self-assigned this Mar 2, 2020
@pvdz
Copy link
Contributor

pvdz commented Apr 16, 2020

@cosmn Hi :) Are you still working on this? The PR hasn't seen an update in the past month so I hope you're still looking into this. If you're no longer interested, could you please let us know all the same? Thanks!

@cosmn
Copy link
Author

cosmn commented May 3, 2020

Hi @pvdz. I apologize for the late reply; busy period
I stopped working on this once I realized its drifting away from a constructive discussion on the solution at hand. I suggest to revisit your policy of welcoming and handling PRs. It felt hostile and dismissive. You were nitpicking code syntax instead of cooperating.

Since I will not be contributing with code anymore but I still want to see this issue addressed and fixed, I will write here what I think is the root cause of the entire situation:

Gasby's approach of generating static files, conflicts with the way serve-static is handling directories.

  1. Either programmatically through createPage or other internal means, Gatsby will generate a page as /public/generated-page-slug/index.html. Nothing wrong with that, except that:
  2. serve-static performs a redirect when it encounters a directory, as seen in their source code
68  var onDirectory = redirect
69      ? createRedirectDirectoryListener()
70      : createNotFoundDirectoryListener()
...
99  stream.on('directory', onDirectory)
...
182 function createRedirectDirectoryListener () {
183     return function redirect (res) {
...

Although serve-static has part of the blame here for creating an unexpected behavior, they wont be able to fix this due to the high amount of dependents relying on it, projects and node modules.
The rest of the blame is on Gatsby for not having its own static server module, which can be controlled, improved, exposed to developers for extended functionality.
One other option is to save those pages to /public/generated-page-slug.html, most likely under a flag to avoid a breaking change.
I find it hard to believe this behavior hasn't arise during development and tests. It seems swept under the carpet for way too long.

People have created several issues and countless comments on this particular matter. Whoever is familiar enough with this part of Gatsby's functionality, Please make things right.

You may close this PR when you see fit.

@wardpeet
Copy link
Contributor

wardpeet commented May 4, 2020

Could you tell me

Hey @cosmn,

Sorry about the above, you have to understand that we have to also keep our code maintainable. Our apologies if it didn't feel welcoming at all.

I do have a question for you that I also added in other PR that does similar things.

See #19567 (comment)

So really gatsby is hiding the redirects the server is doing by doing its own clientside redirects. Our team had a similar issue with gatsby before, and the google SEO tools told us about it.

I'm aware of gatsby is hiding this implementation and that a redirection is happening. The problem is that when we build, we always build like you have trailing slashes and it's the server implementation that needs to handle pretty URLs, not us.

gatsby serve is not a production server, it's merely there to run e2e-tests, do a final check if everything is looking alright.

So I'm still unsure what the use case is of not doing a 301 in serve.

@ascorbic
Copy link
Contributor

I'm going to close this PR as it doesn't seem to be going anywhere

@ascorbic ascorbic closed this May 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants