Skip to content

Commit

Permalink
breaking(gatsby-plugin-sitemap): vNext rewrite (#25670)
Browse files Browse the repository at this point in the history
Co-authored-by: Ward Peeters <ward@coding-tech.com>
  • Loading branch information
moonmeister and wardpeet authored Apr 19, 2021
1 parent 2267632 commit 3d65a1c
Show file tree
Hide file tree
Showing 14 changed files with 833 additions and 594 deletions.
223 changes: 163 additions & 60 deletions packages/gatsby-plugin-sitemap/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,89 +21,192 @@ plugins: [`gatsby-plugin-sitemap`]
Above is the minimal configuration required to have it work. By default, the
generated sitemap will include all of your site's pages, except the ones you exclude.

## Recommended usage

You probably do not want to use the defaults in this plugin. Here's an example of the default output:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.net/blog/</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>
<url>
<loc>https://example.net/</loc>
<changefreq>daily</changefreq>
<priority>0.7</priority>
</url>
</urlset>
```

See the `changefreq` and `priority` fields? Those will be the same for every page, no matter how important or how often it gets updated. They will most likely be wrong. But wait, there's more, in their [docs](https://support.google.com/webmasters/answer/183668?hl=en) Google says:

> - Google ignores `<priority>` and `<changefreq>` values, so don't bother adding them.
> - Google reads the `<lastmod>` value, but if you misrepresent this value, we will stop reading it.
You really want to customize this plugin config to include an accurate `lastmod` date. Checkout the [example](#example) for an example of how to do this.

## Options

The `defaultOptions` [here](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-plugin-sitemap/src/internals.js#L71) can be overridden.
The [`default config`](https://github.com/gatsbyjs/gatsby/blob/master/packages/gatsby-plugin-sitemap/src/options-validation.js) can be overridden.

The options are as follows:

- `query` (GraphQL Query) The query for the data you need to generate the sitemap. It's required to get the site's URL, if you are not fetching it from `site.siteMetadata.siteUrl`, you will need to set a custom `resolveSiteUrl` function. If you override the query, you probably will also need to set a `serializer` to return the correct data for the sitemap. Due to how this plugin was built it is currently expected/required to fetch the page paths from `allSitePage`, but you may use the `allSitePage.edges.node` or `allSitePage.nodes` query structure.
- `output` (string) The filepath and name. Defaults to `/sitemap.xml`.
- `exclude` (array of strings) An array of paths to exclude from the sitemap.
- `createLinkInHead` (boolean) Whether to populate the `<head>` of your site with a link to the sitemap.
- `serialize` (function) Takes the output of the data query and lets you return an array of sitemap entries.
- `resolveSiteUrl` (function) Takes the output of the data query and lets you return the site URL.
- `output` (string = `/sitemap`) Folder path where sitemaps are stored.
- `createLinkInHead` (boolean = true) Whether to populate the `<head>` of your site with a link to the sitemap.
- `entryLimit` (number = 45000) Number of entries per sitemap file, a sitemap index and multiple sitemaps are created if you have more entries.
- `exclude` (string[] = []) An array of paths to exclude from the sitemap. While this is usually an array of strings it is possible to enter other data types into this array for custom filtering. Doing so will require customization of the [`filterPages`](#filterPages) function.
- `query` (GraphQL Query) The query for the data you need to generate the sitemap. It's required to get the site's URL, if you are not fetching it from `site.siteMetadata.siteUrl`, you will need to set a custom [`resolveSiteUrl`](#resolveSiteUrl) function. If you override the query, you may need to pass in a custom [`resolvePagePath`](#resolvePagePath), [`resolvePages`](#resolvePages) to keep everything working. If you fetch pages without using `allSitePage.nodes` query structure you will definately need to customize the [`resolvePages`](#resolvePages) function.
- [`resolveSiteUrl`](#resolveSiteUrl) (function) Takes the output of the data query and lets you return the site URL. Sync or async functions allowed.
- [`resolvePagePath`](#resolvePagePath) (function) Takes a page object and returns the uri of the page (no domain or protocol).
- [`resolvePages`](#resolvePagePath) (function) Takes the output of the data query and expects an array of page objects to be returned. Sync or async functions allowed.
- [`filterPages`](#filterPages) (function) Takes the current page a string (or other object) from the `exclude` array and expects a boolean to be returned. `true` excludes the path, `false` keeps it.
- [`serialize`](#serialize) (function) Takes the output of `filterPages` and lets you return a sitemap entry. Sync or async functions allowed.

We _ALWAYS_ exclude the following pages: `/dev-404-page`,`/404` &`/offline-plugin-app-shell-fallback`, this cannot be changed.
The following pages are **always** excluded: `/dev-404-page`,`/404` &`/offline-plugin-app-shell-fallback`, this cannot be changed even by customizing the [`filterPages`](#filterPages) function.

Example:
## Example:

```javascript
const siteUrl = process.env.URL || `https://fallback.net`

// In your gatsby-config.js
siteMetadata: {
siteUrl: `https://www.example.com`,
},
plugins: [
{
resolve: `gatsby-plugin-sitemap`,
options: {
output: `/some-other-sitemap.xml`,
// Exclude specific pages or groups of pages using glob parameters
// See: https://github.com/isaacs/minimatch
// The example below will exclude the single `path/to/page` and all routes beginning with `category`
exclude: [`/category/*`, `/path/to/page`],
query: `
module.exports = {
plugins: [
{
resolve: "gatsby-plugin-sitemap",
options: {
query: `
{
wp {
generalSettings {
siteUrl
}
}
allSitePage {
nodes {
path
}
}
}`,
resolveSiteUrl: ({site, allSitePage}) => {
//Alternatively, you may also pass in an environment variable (or any location) at the beginning of your `gatsby-config.js`.
return site.wp.generalSettings.siteUrl
},
serialize: ({ site, allSitePage }) =>
allSitePage.nodes.map(node => {
allWpContentNode(filter: {nodeType: {in: ["Post", "Page"]}}) {
nodes {
... on WpPost {
uri
modifiedGmt
}
... on WpPage {
uri
modifiedGmt
}
}
}
}
`,
resolveSiteUrl: () => siteUrl,
resolvePages: ({
allSitePage: { nodes: allPages },
allWpContentNode: { nodes: allWpNodes },
}) => {
const wpNodeMap = allWpNodes.reduce((acc, node) => {
const { uri } = node
acc[uri] = node

return acc
}, {})

return allPages.map(page => {
return { ...page, ...wpNodeMap[page.path] }
})
},
serialize: ({ path, modifiedGmt }) => {
return {
url: `${site.wp.generalSettings.siteUrl}${node.path}`,
changefreq: `daily`,
priority: 0.7,
url: path,
lastmod: modifiedGmt,
}
})
}
}
]
},
},
},
],
}
```

## Sitemap Index
## API Reference

<a id=resolveSiteUrl></a>

## resolveSiteUrl ⇒ <code>string</code>

Sync or async functions allowed.

**Returns**: <code>string</code> - - site URL, this can come from the graphql query or another scope.

| Param | Type | Description |
| ----- | ------------------- | ---------------------------- |
| data | <code>object</code> | Results of the GraphQL query |

<a id=resolvePagePath></a>

## resolvePagePath ⇒ <code>string</code>

If you don't want to place the URI in `path` then `resolvePagePath`
is needed.

We also support generating `sitemap index`.
**Returns**: <code>string</code> - - uri of the page without domain or protocol

- [Split up your large sitemaps](https://support.google.com/webmasters/answer/75712?hl=en)
- [Using Sitemap index files (to group multiple sitemap files)](https://www.sitemaps.org/protocol.html#index)
| Param | Type | Description |
| ----- | ------------------- | ------------------- |
| page | <code>object</code> | <code>string</code> | Array Item returned from resolvePages |

<a id=resolvePages></a>

## resolvePages ⇒ <code>Array</code>

This allows custom resolution of the array of pages.
This also where users could merge multiple sources into
a single array if needed. Sync or async functions allowed.

**Returns**: <code>object[]</code> - - Array of objects representing each page

| Param | Type | Description |
| ----- | ------------------- | ---------------------------- |
| data | <code>object</code> | results of the GraphQL query |

<a id="filterPages"></a>

## filterPages ⇒ <code>boolean</code>

This allows filtering any data in any way.

This function is executed via:

```javascript
// In your gatsby-config.js
siteMetadata: {
siteUrl: `https://www.example.com`,
},
plugins: [
{
resolve: `gatsby-plugin-sitemap`,
options: {
sitemapSize: 5000
}
}
]
allPages.filter(
page => !excludes.some(excludedRoute => thisFunc(page, ecludedRoute, tools))
)
```

`allPages` is the results of the [`resolvePages`](#resolvePages) function.

**Returns**: <code>Boolean</code> - - `true` excludes the path, `false` keeps it.

| Param | Type | Description |
| ------------- | ------------------- | ----------------------------------------------------------------------------------- |
| page | <code>object</code> | |
| excludedRoute | <code>string</code> | Element from `exclude` Array in plugin config. |
| tools | <code>object</code> | contains tools for filtering `{ minimatch, withoutTrailingSlash, resolvePagePath }` |

<a id="serialize"></a>

## serialize ⇒ <code>object</code>

This function is executed by:

```javascript
allPages.map(page => thisFunc(page, tools))
```

Above is the minimal configuration to split a large sitemap.
When the number of URLs in a sitemap is more than 5000, the plugin will create sitemap (e.g. `sitemap-0.xml`, `sitemap-1.xml`) and index (e.g. `sitemap.xml`) files.
`allpages` is the result of the [`filterPages`](#filterPages) function. Sync or async functions allowed.

**Kind**: global variable

| Param | Type | Description |
| ----- | ------------------- | ---------------------------------------------------------------- |
| page | <code>object</code> | A single element from the results of the `resolvePages` function |
| tools | <code>object</code> | contains tools for serializing `{ resolvePagePath }` |
10 changes: 6 additions & 4 deletions packages/gatsby-plugin-sitemap/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,14 @@
"@babel/runtime": "^7.12.5",
"common-tags": "^1.8.0",
"minimatch": "^3.0.4",
"pify": "^3.0.0",
"sitemap": "^1.13.0"
"sitemap": "^6.3.0"
},
"devDependencies": {
"@babel/cli": "^7.12.1",
"@babel/core": "^7.12.3",
"babel-preset-gatsby-package": "^1.4.0-next.0",
"cross-env": "^7.0.3"
"cross-env": "^7.0.3",
"gatsby-plugin-utils": "1.4.0-next.0"
},
"homepage": "https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby-plugin-sitemap#readme",
"keywords": [
Expand All @@ -39,7 +39,9 @@
"scripts": {
"build": "babel src --out-dir . --ignore \"**/__tests__\"",
"prepare": "cross-env NODE_ENV=production npm run build",
"watch": "babel -w src --out-dir . --ignore \"**/__tests__\""
"watch": "babel -w src --out-dir . --ignore \"**/__tests__\"",
"test": "jest",
"test:watch": "jest --watch"
},
"engines": {
"node": ">=12.13.0"
Expand Down
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`Test plugin sitemap custom query runs 1`] = `
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
<url> <loc>http://dummy.url/post/page-1</loc> <changefreq>weekly</changefreq> <priority>0.8</priority> </url>
</urlset>"
exports[`gatsby-plugin-sitemap Node API should accept a custom query 1`] = `
Array [
Object {
"changefreq": "weekly",
"priority": 0.8,
"url": "http://dummy.url/page-1",
},
]
`;

exports[`Test plugin sitemap default settings work properly 1`] = `
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
<url> <loc>http://dummy.url/page-1</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
<url> <loc>http://dummy.url/page-2</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
</urlset>"
exports[`gatsby-plugin-sitemap Node API should succeed with default options 1`] = `
Array [
Object {
"changefreq": "daily",
"priority": 0.7,
"url": "http://dummy.url/page-1",
},
Object {
"changefreq": "daily",
"priority": 0.7,
"url": "http://dummy.url/page-2",
},
]
`;
exports[`Test plugin sitemap sitemap index set sitemap size and urls are less than it. 1`] = `
"<?xml version=\\"1.0\\" encoding=\\"UTF-8\\"?>
<urlset xmlns=\\"http://www.sitemaps.org/schemas/sitemap/0.9\\" xmlns:news=\\"http://www.google.com/schemas/sitemap-news/0.9\\" xmlns:xhtml=\\"http://www.w3.org/1999/xhtml\\" xmlns:mobile=\\"http://www.google.com/schemas/sitemap-mobile/1.0\\" xmlns:image=\\"http://www.google.com/schemas/sitemap-image/1.1\\" xmlns:video=\\"http://www.google.com/schemas/sitemap-video/1.1\\">
<url> <loc>http://dummy.url/page-1</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
<url> <loc>http://dummy.url/page-2</loc> <changefreq>daily</changefreq> <priority>0.7</priority> </url>
</urlset>"
`;
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`Adds <Link> for site to head creates Link href with path prefix when __PATH_PREFIX__ sets 1`] = `
exports[`gatsby-plugin-sitemap SSR API creates Link href with path prefix when __PATH_PREFIX__ sets 1`] = `
[MockFunction] {
"calls": Array [
Array [
Array [
<link
href="/hogwarts/sitemap.xml"
href="/hogwarts/test-folder/sitemap-index.xml"
rel="sitemap"
type="application/xml"
/>,
Expand All @@ -22,13 +22,13 @@ exports[`Adds <Link> for site to head creates Link href with path prefix when __
}
`;

exports[`Adds <Link> for site to head creates Link if createLinkInHead is true 1`] = `
exports[`gatsby-plugin-sitemap SSR API should create a Link if createLinkInHead is true 1`] = `
[MockFunction] {
"calls": Array [
Array [
Array [
<link
href="/sitemap.xml"
href="/test-folder/sitemap-index.xml"
rel="sitemap"
type="application/xml"
/>,
Expand All @@ -44,4 +44,4 @@ exports[`Adds <Link> for site to head creates Link if createLinkInHead is true 1
}
`;

exports[`Adds <Link> for site to head does not create Link if createLinkInHead is false 1`] = `[MockFunction]`;
exports[`gatsby-plugin-sitemap SSR API should not create Link if createLinkInHead is false 1`] = `[MockFunction]`;
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
// Jest Snapshot v1, https://goo.gl/fbAQLP

exports[`gatsby-plugin-sitemap internals tests pageFilter should filter correctly 1`] = `
Array [
Object {
"path": "/to/keep/1",
},
Object {
"path": "/to/keep/2",
},
]
`;
Loading

0 comments on commit 3d65a1c

Please sign in to comment.