Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sitemap url and Astro.url mismatch with build.format file #11575

Open
1 task done
fjdelarubia opened this issue Jul 30, 2024 · 9 comments · May be fixed by #11684
Open
1 task done

Sitemap url and Astro.url mismatch with build.format file #11575

fjdelarubia opened this issue Jul 30, 2024 · 9 comments · May be fixed by #11684
Labels
- P4: important Violate documented behavior or significantly impacts performance (priority)

Comments

@fjdelarubia
Copy link

Astro Info

Astro                    v4.7.0
Node                     v20.9.0
System                   macOS (arm64)
Package Manager          npm
Output                   static
Adapter                  none
Integrations             @astrojs/sitemap
                         @astrojs/mdx

If this issue only occurs in one browser, which browser is a problem?

No response

Describe the Bug

When build.format is set to file (or preserve) and Astro builds the html files for each page file, the canonical url generated in build time with Astro.url using:
const canonicalURL = new URL(Astro.url.pathname, Astro.site);

contains the file extension, in the stackblitz example, for page2.astro we have:
<link rel="canonical" href="https://astro.build/page2.html">

But in the sitemap, the location for that page is:
<url><loc>https://astro.build/page2</loc></url>

What's the expected result?

Sitemap urls to be the same as canonical urls build with

const canonicalURL = new URL(Astro.url.pathname, Astro.site);

Link to Minimal Reproducible Example

https://stackblitz.com/edit/github-3v2ky8-nbjmmy?file=src%2Fpages%2Fpage2.astro

Participation

  • I am willing to submit a pull request for this issue.
@github-actions github-actions bot added the needs triage Issue needs to be triaged label Jul 30, 2024
@matthewp matthewp added - P4: important Violate documented behavior or significantly impacts performance (priority) and removed needs triage Issue needs to be triaged labels Jul 31, 2024
@matthewp
Copy link
Contributor

Thanks! PR would be appreciated in this case.

@fjdelarubia
Copy link
Author

Thanks! PR would be appreciated in this case.

I'll try it!, but which one is the good url? with or without file extension? 😅

@ematipico
Copy link
Member

The one without .html should be the correct one

@fjdelarubia
Copy link
Author

fjdelarubia commented Aug 1, 2024

The one without .html should be the correct one

I've been investigating and, from what I've discovered (it's my first time delving into the inner workings of Astro), the URL generated during the build process adds the .html extension if build.format is set to "file" https://github.com/withastro/astro/blob/main/packages/astro/src/core/build/generate.ts#L402.

This makes me question whether the correct URL should indeed exclude the .html, since it seems to be intentionally included.

@ematipico
Copy link
Member

That's the correct URL during the build, but you want to generate a URL that users want to navigate to, and usually users want to have a URL without the html extension

@fjdelarubia
Copy link
Author

That's the correct URL during the build, but you want to generate a URL that users want to navigate to, and usually users want to have a URL without the html extension

But AFAIK the URL during the build is the one used by Astro.url.pathname, that's why the canonical url in the build has the .html extension, should the canonical url be formed without the pathname?

@tylermichael
Copy link

I believe this is caused by this change.

It makes the URLs in the sitemap use the format https://astro.build/page2 when this is true: if (astroConfig.trailingSlash === 'never' || astroConfig.build.format === 'file').

That PR also also makes it difficult to migrate to astroConfig.trailingSlash = 'never' from the default value of ignore, because when you change that setting, your existing URLs indexed by search engines will return 404's. I think you can solve that by setting up redirects, which isn't ideal IMO.

@fjdelarubia
Copy link
Author

fjdelarubia commented Aug 10, 2024

The logic for generating the URL is scattered across different places, and in the end, that's what's causing problems:

I think we could create a utility that generates the URLs (or paths) based on the Astro configuration, and that it should be used everywhere

@fjdelarubia fjdelarubia linked a pull request Aug 12, 2024 that will close this issue
@fjdelarubia
Copy link
Author

I ended up with simple fix, I would like to create that useful but I don't have the knowledge yet for doing it right 😞

@florian-lefebvre florian-lefebvre linked a pull request Aug 15, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
- P4: important Violate documented behavior or significantly impacts performance (priority)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants