Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

301 redirects adding trailing slashes (gatsby build+serve) #19543

Closed
adrienharnay opened this issue Nov 15, 2019 · 21 comments
Closed

301 redirects adding trailing slashes (gatsby build+serve) #19543

adrienharnay opened this issue Nov 15, 2019 · 21 comments
Assignees
Labels
status: awaiting author response Additional information has been requested from the author status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer. type: bug An issue or pull request relating to a bug in Gatsby

Comments

@adrienharnay
Copy link
Contributor

adrienharnay commented Nov 15, 2019

Hello

Description

I have problem with 301 redirects adding trailing slashes (/path => /path/) ONLY when I build+serve my site (gatsby build && gatsby serve && open http://localhost:8000/hi => I get redirected to http://localhost:8000/hi/). Doesn't happen using gatsby develop.

Steps to reproduce

I have :

  • no gatsby-browser.js file
  • no gatsby-ssr.js file
  • an empty gatsby-config.js file
  • gatsby-node.js
exports.createPages = ({ actions: { createPage } }) =>
  createPage({
    path: '/hi',
    component: HiComponent,
  });
  • HiComponent renders hi and that's it

So... minimum code possible, and still have this issue. Tried gatsby-plugin-remove-trailing-slashes but didn't do anything, as it is only for pages created from the pages folder.

Expected result

No 301, no redirect

Actual result

Redirection

Environment

System:
    OS: macOS 10.14.6
    CPU: (4) x64 Intel(R) Core(TM) i7-7567U CPU @ 3.50GHz
    Shell: 5.3 - /bin/zsh
  Binaries:
    Node: 12.12.0 - /var/folders/mt/71kkzssd3fq24s3x3y6wpc4h0000gn/T/yarn--1573821730999-0.42525738248781186/node
    Yarn: 1.19.1 - /var/folders/mt/71kkzssd3fq24s3x3y6wpc4h0000gn/T/yarn--1573821730999-0.42525738248781186/yarn
    npm: 6.11.3 - /usr/local/bin/npm
  Languages:
    Python: 2.7.16 - /usr/local/bin/python
  Browsers:
    Chrome: 78.0.3904.97
    Firefox: 65.0
    Safari: 13.0.3

Thanks in advance!

@blainekasten
Copy link
Contributor

@adrienharnay thank you for the detailed report! I have confirmed and reproduced the issue. Is this something that you might be interested in contributing a fix? I'd be happy to help guide you through the process if you're interested!

@blainekasten blainekasten added status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer. type: bug An issue or pull request relating to a bug in Gatsby labels Nov 15, 2019
@adrienharnay
Copy link
Contributor Author

Sure, I would be happy to!

Do you have any idea where that could come from? As someone who doesn't know much about how Gatsby works internally, I would guess that the servers from gatsby develop and gatsby serve behave in a different manner, since the bug only happens using gatsby serve.

I will set up a repository with the minimum code possible to reproduce the issue so that we can test the fix on it when it's done.

@blainekasten
Copy link
Contributor

@adrienharnay gatsby develop is running a webpack dev server process, while gatsby serve is just an express app running against the build artifacts from gatsby build

You can see all of the command code under packages/gatsby/src/commands. I think the problem might be in how we use express.static. Looking at the source for that they do the redirect if the requested path is a directory. Which is what our build artifact produces.

So I'm curious to see what you come up with. Please ask any questions along the way. We have a document here on how to get your dev environment setup too!

https://www.gatsbyjs.org/contributing/setting-up-your-local-dev-environment/

@adrienharnay
Copy link
Contributor Author

Thanks for the pointers, will report back as soon as I have some news :)

@adrienharnay
Copy link
Contributor Author

I've created the repo: https://github.com/adrienharnay/gatsby-simplest-example. Now starting to work on the issue

@adrienharnay
Copy link
Contributor Author

adrienharnay commented Nov 16, 2019

I have developed a fix on the repo I linked earlier. The procedure is detailed in the README. Basically, the fix is to render public/hello.html instead of public/hello/index.html if the path given to createPage does not end with a slash.

I tried several other fixes by tweaking express.static and express.Router, without luck. I don't know if the new architecture could break anything else, but that's the only fix I could find.

Please let me know if that's OK, then I'll file a PR to gatsby :)

Edit: I have created the PR so that you can see the changes/merge it directly if everything is OK!

@blainekasten
Copy link
Contributor

@adrienharnay Thank you so much for taking your time to do this! I played around with express.static for about 30m and wasn't sure if we had any solutions, but it's cool to see what you found!

I'm looking at your PR and trying to see if there are any negative side-effects to your change.

I'm also curious how you found contributing so far? Was it easy to get set up locally? Was it easy to test and feel confident in your fix? I'm just trying to gauge how well the community can feel empowered to contribute. :)

@adrienharnay
Copy link
Contributor Author

Thank you! It seems e2e tests are broken, but running yarn test locally works, so I'm not sure what I'm missing.

Other than that, contributing has been really easy, thanks to your fast answers and good pointers! I felt overwhelmed by the process to test Gatsby locally, so I went with patch-package instead to patch my node_modules since the changes were fairly simple, and it took me no time to implement the fix! There's no wonder why Gatsby sees so many contributions :)

@blainekasten
Copy link
Contributor

blainekasten commented Nov 18, 2019

@adrienharnay That is valuable feedback. Thank you!

@mareksuscak
Copy link
Contributor

This issue affects gatsby develop command as well. It only surfaces when you reload the page.

@blainekasten
Copy link
Contributor

So i've spent more time researching this. I'm more convinced now that Gatsby is actually doing the right thing. For reference read this article: https://searchfacts.com/url-trailing-slash/

In addition, did you know that we have plugins for adding and removing slashes? I wonder if those would solve your needs instead of changing the way the codebase works. gatsby-plugin-remove-trailing-slash

@varunagrawal
Copy link
Contributor

This issue also pops up in gatsby serve when you do a hard refresh Ctrl+Shift+R. It changes the URL from mysite.com/about to mysite.com/about/.

@blainekasten OP has already tried using gatsby-plugin-remove-trailing-slashes without success. It's mentioned in the issue. Any other suggestions for workarounds?

@aleclarson
Copy link

Ideally, the remove-trailing-slashes plugin would enforce this.

So i've spent more time researching this. I'm more convinced now that Gatsby is actually doing the right thing. For reference read this article: https://searchfacts.com/url-trailing-slash/

As long as the trailing slash is always removed, the worry of Google seeing URLs differently won't be a problem.

@danabrit
Copy link
Contributor

@varunagrawal Hi there! Are you still experiencing this problem? If so, can you please attach a minimal reproduction so we can take a look? Thanks!

@danabrit danabrit added status: awaiting author response Additional information has been requested from the author topic: reach/router and Links labels May 31, 2020
@NickAJScott
Copy link

I think i'm running into this same issue, however even if the trailing slash is consistently added/removed this still causes issues with Google SEO as the redirect that adds the slash shows as a 301 and as a result google doesn't index that page. Is there a way to disable the addition of a slash? e.g. if i go to https://gatsby-blog-no-slash.wardpeet.dev/new-beginnings i can see that the redirect to add the slash is happening and then the slash is later being removed, this would still cause issues with google indexing.

@mareksuscak
Copy link
Contributor

mareksuscak commented Jun 17, 2020

We were running into the same issue and the only solution that worked was to ask the marketing team to include a trailing slash and direct all traffic to the "canonical" URL.

Alternatively, you could write an S3 deployment script that would essentially duplicate all index.html files outside of their parent folder and rename them after the parent folder (without an extension).

Keep in mind, this can't be done or tested locally on your machine because that would create conflicting file names on your filesystem but S3 actually supports such a configuration where a folder and a file can share the same name while residing in the same parent directory. This way you could support both URLs and then add a canonical URL link tag that would point at the preferred destination.

Example:
https://gist.github.com/cmawhorter/b706e45ba88e43bc379c

@blainekasten
Copy link
Contributor

@varunagrawal please keep in mind gatsby serve is not meant for production, but instead just to simulate production locally. Every hosting platform is different and can usually be customized on how files are resolved in reference to their urls.

I'm going to close this because I'm not convinced there is an absolutely correct approach to this. We have the plugins to add and remove trailing slashes. Feel free to keep discussing in the closed issue though!

@jokklan
Copy link

jokklan commented Jul 7, 2020

We experience this issue on first load and reload on both gatsby develop, gatsby serve and in production. gatsby-plugin-remove-trailing-slashes only changes the links, it doesn't fix the issue that urls without trailing / will be redirected to a url with a trailing /. Because of this are gatsby-plugin-remove-trailing-slashes actually harming our website links, because all links will result in a url that will redirect on reload.

@mareksuscak
Copy link
Contributor

mareksuscak commented Jul 7, 2020

We managed to resolve this issue as described in my previous comment. It requires two scripts:

  1. The first step requires running the first part of the script that'll clone all index.html files and move them outside of their parent folder, then rename them to reflect the name of the parent folder + collision prevention suffix needs to be appended since filesystems typically don't allow a folder and a file with the same name to coexist:

Before:

- grandparent
-   parent
-     index.html

After:

- grandparent
-   parent
-     index.html
-   parent.collision.html (this is a clone of the index.html above)
  1. The second step requires uploading the public folder to S3.
  2. The third step requires running the second part of the script which traverses the S3 bucket and removes the collision prevention extension.

Scripts are available here and one of them is loosely based on the gist I shared above:

https://gist.github.com/mareksuscak/e0a94987a24038a3b81f045602f6274b

@LpmRaven
Copy link

LpmRaven commented Nov 16, 2020

@mareksuscak That was really helpful. It took me a few days to work this out as I thought my Cloudfront was doing something weird.

Some information that I believe you may find useful: If you use your command find ./public -mindepth 2 -name "index.html" | sed 's/^\(.*\)\/index\.html$/\1/' | grep -v 404 | xargs -L1 -t -I {} cp "{}/index.html" "{}.html" in an AWS Codepipeline and Codebuild, you don't actually have to do the renaming step because it stores the temp. artifact build files inside S3.

How did you configure your S3 index document?

"However, if you exclude the trailing slash from the preceding URL, Amazon S3 first looks for an object photos in the bucket. If the photos object is not found, it searches for an index document, photos/index.html. If that document is found, Amazon S3 returns a 302 Found message and points to the photos/ key. For subsequent requests to photos/, Amazon S3 returns photos/index.html. If the index document is not found, Amazon S3 returns an error."

from here

302 Responses are a nightmare for SEO. I know it can be fixed with a lambda@edge but this all seems a lot of effort to remove a trailing slash.

@jon-sully
Copy link

Greetings 👋🏻

For those arriving from the Google machine, this discussion was effectively continued and resolved in the following thread within the Discussion section of this repository: #27889 (comment)

Please reference that thread and give it a read for more info on how Gatsby handles trailing slashes and how to configure your particular site (and hosting) to accommodate the particular behavior you're looking for 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: awaiting author response Additional information has been requested from the author status: confirmed Issue with steps to reproduce the bug that’s been verified by at least one reviewer. type: bug An issue or pull request relating to a bug in Gatsby
Projects
None yet
10 participants