Skip to content

Commit

Permalink
chore(gatsby-source-filesystem): Improve README (#37480)
Browse files Browse the repository at this point in the history
  • Loading branch information
LekoArts authored Jan 18, 2023
1 parent 8a718e3 commit 16685a6
Showing 1 changed file with 68 additions and 86 deletions.
154 changes: 68 additions & 86 deletions packages/gatsby-source-filesystem/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# gatsby-source-filesystem

A Gatsby source plugin for sourcing data into your Gatsby application from your local filesystem.
A Gatsby plugin for sourcing data into your Gatsby application from your local filesystem.

The plugin creates `File` nodes from files. The various "transformer" plugins can transform `File` nodes into various other types of data e.g. [`gatsby-transformer-json`](https://www.gatsbyjs.com/plugins/gatsby-transformer-json/) transforms JSON files into JSON data nodes and [`gatsby-transformer-remark`](https://www.gatsbyjs.com/plugins/gatsby-transformer-remark/) transforms markdown files into `MarkdownRemark` nodes from which you can query an HTML representation of the markdown.
The plugin creates `File` nodes from files. The various [transformer plugins](https://www.gatsbyjs.com/plugins/?=gatsby-transformer) can transform `File` nodes into other types of data e.g. [`gatsby-transformer-json`](https://www.gatsbyjs.com/plugins/gatsby-transformer-json/) transforms JSON files into `JSON` nodes and [`gatsby-transformer-remark`](https://www.gatsbyjs.com/plugins/gatsby-transformer-remark/) transforms markdown files into `MarkdownRemark` nodes.

## Install

Expand All @@ -12,7 +12,7 @@ npm install gatsby-source-filesystem

## How to use

You can have multiple instances of this plugin to read source nodes from different locations on your filesystem. Be sure to give each instance a unique `name`.
You can have multiple instances of this plugin in your `gatsby-config` to read files from different locations on your filesystem. Be sure to give each instance a unique `name`.

```js:title=gatsby-config.js
module.exports = {
Expand Down Expand Up @@ -41,6 +41,8 @@ module.exports = {
}
```

In the above example every file under `src/pages` and `src/data` will be made available as a `File` node inside GraphQL. You don't need to set up another instance of `gatsby-source-filesystem` for e.g. `src/data/images` (since those files are already sourced). However, if you want to be able to filter your files you can set up a new instance and later use the `sourceInstanceName`.

## Options

### name
Expand Down Expand Up @@ -82,43 +84,41 @@ By default, `gatsby-source-filesystem` creates an MD5 hash of each file to deter

### Environment variables

To prevent concurrent requests overload of `processRemoteNode`, you can adjust the `200` default concurrent downloads, with `GATSBY_CONCURRENT_DOWNLOAD` environment variable.
- `GATSBY_CONCURRENT_DOWNLOAD` (default: `200`). To prevent concurrent requests you can configure the concurrency of `processRemoteNode`.

In case that due to spotty network, or slow connection, some remote files fail to download. Even after multiple retries and adjusting concurrent downloads, you can adjust timeout and retry settings with these environment variables:
If you have a spotty network or slow connection, you can adjust the retries and timeouts:

- `GATSBY_STALL_RETRY_LIMIT`, default: `3`
- `GATSBY_STALL_TIMEOUT`, default: `30000`
- `GATSBY_CONNECTION_TIMEOUT`, default: `30000`
- `GATSBY_STALL_RETRY_LIMIT` (default: `3`)
- `GATSBY_STALL_TIMEOUT` (default: `30000`)
- `GATSBY_CONNECTION_TIMEOUT` (default: `30000`)

## How to query

You can query file nodes like the following:
You can query the `File` nodes as following:

```graphql
{
allFile {
edges {
node {
extension
dir
modifiedTime
}
nodes {
extension
dir
modifiedTime
}
}
}
```

To filter by the `name` you specified in the config, use `sourceInstanceName`:
Use [GraphiQL](https://www.gatsbyjs.com/docs/how-to/querying-data/running-queries-with-graphiql/) to explore all available keys.

To filter by the `name` you specified in the `gatsby-config`, use `sourceInstanceName`:

```graphql
{
allFile(filter: { sourceInstanceName: { eq: "data" } }) {
edges {
node {
extension
dir
modifiedTime
}
nodes {
extension
dir
modifiedTime
}
}
}
Expand All @@ -128,24 +128,24 @@ To filter by the `name` you specified in the config, use `sourceInstanceName`:

`gatsby-source-filesystem` exports three helper functions:

- `createFilePath`
- `createRemoteFileNode`
- `createFileNodeFromBuffer`
- [`createFilePath`](#createfilepath)
- [`createRemoteFileNode`](#createremotefilenode)
- [`createFileNodeFromBuffer`](#createfilenodefrombuffer)

### createFilePath
### `createFilePath`

When building pages from files, you often want to create a URL from a file's path on the file system. E.g. if you have a markdown file at `src/content/2018-01-23-an-exploration-of-the-nature-of-reality/index.md`, you might want to turn that into a page on your site at `example.com/2018-01-23-an-exploration-of-the-nature-of-reality/`. `createFilePath` is a helper function to make this task easier.
When building pages from files, you often want to create a URL from a file's path on the filesystem. For example, if you have a markdown file at `src/content/2018-01-23-my-blog-post/index.md`, you might want to turn that into a page on your site at `example.com/blog/2018-01-23-my-blog-post/`. `createFilePath` is a helper function to make this task easier.

```javascript
createFilePath({
// The node you'd like to convert to a path
// e.g. from a markdown, JSON, YAML file, etc
// e.g. from a markdown, JSON, YAML file, etc.
node,
// Method used to get a node
// The parameter from `onCreateNode` should be passed in here
getNode,
// The base path for your files.
// It is relative to the `options.path` setting in the `gatsby-source-filesystem` entries of your `gatsby-config.js`.
// It is relative to the `options.path` setting in the `gatsby-source-filesystem` entries of your `gatsby-config`.
// Defaults to `src/pages`. For the example above, you'd use `src/content`.
basePath,
// Whether you want your file paths to contain a trailing `/` slash
Expand All @@ -154,35 +154,35 @@ createFilePath({
})
```

#### Example usage
#### Example

```javascript
```js:title=gatsby-node.js
const { createFilePath } = require(`gatsby-source-filesystem`)

exports.onCreateNode = ({ node, getNode, actions }) => {
const { createNodeField } = actions
// Ensures we are processing only markdown files
if (node.internal.type === "MarkdownRemark") {
// Use `createFilePath` to turn markdown files in our `data/faqs` directory into `/faqs/slug`
// Use `createFilePath` to turn markdown files in our `src/content` directory into `/blog/slug`
const relativeFilePath = createFilePath({
node,
getNode,
basePath: "data/faqs/",
basePath: "src/content",
})

// Creates new query'able field with name of 'slug'
createNodeField({
node,
name: "slug",
value: `/faqs${relativeFilePath}`,
value: `/blog${relativeFilePath}`,
})
}
}
```

### createRemoteFileNode
### `createRemoteFileNode`

When building source plugins for remote data sources such as headless CMSs, their data will often link to files stored remotely that are often convenient to download so you can work with them locally.
When building source plugins for remote data sources (Headless CMSs, APIs, etc.), their data will often link to files stored remotely that are often convenient to download so you can work with them locally.

The `createRemoteFileNode` helper makes it easy to download remote files and add them to your site's GraphQL schema.

Expand All @@ -192,80 +192,63 @@ While downloading the assets, special characters (regex: `/:|\/|\*|\?|"|<|>|\||\
createRemoteFileNode({
// The source url of the remote file
url: `https://example.com/a-file.jpg`,

// The id of the parent node (i.e. the node to which the new remote File node will be linked to.
// The id of the parent node (i.e. the node to which the new remote File node will be linked to)
parentNodeId,

// Gatsby's cache which the helper uses to check if the file has been downloaded already. It's passed to all Node APIs.
getCache,

// The action used to create nodes
createNode,

// A helper function for creating node Ids
createNodeId,

// OPTIONAL
// Adds htaccess authentication to the download request if passed in.
auth: { htaccess_user: `USER`, htaccess_pass: `PASSWORD` },

// OPTIONAL
// Adds extra http headers to download request if passed in.
httpHeaders: { Authorization: `Bearer someAccessToken` },

// OPTIONAL
// Sets the file extension
ext: ".jpg",
ext: `.jpg`,
})
```

#### Example usage
#### Example

The following example is pulled from [gatsby-source-wordpress](https://github.com/gatsbyjs/gatsby/tree/master/packages/gatsby-source-wordpress). Downloaded files are created as `File` nodes and then linked to the WordPress Media node, so it can be queried both as a regular `File` node and from the `localFile` field in the Media node.
The following example is pulled from the [Preprocessing External Images guide](https://www.gatsbyjs.com/docs/how-to/images-and-media/preprocessing-external-images/). Downloaded files are created as `File` nodes and then linked to the `MarkdownRemark` node, so it can be used with e.g. [`gatsby-plugin-image`](https://www.gatsbyjs.com/docs/how-to/images-and-media/using-gatsby-plugin-image/). The file node can then be queried using GraphQL.

```javascript
const { createRemoteFileNode } = require(`gatsby-source-filesystem`)
```js:title=gatsby-node.js
const { createRemoteFileNode } = require("gatsby-source-filesystem")

exports.downloadMediaFiles = ({
nodes,
getCache,
createNode,
exports.onCreateNode = async ({
node,
actions: { createNode, createNodeField },
createNodeId,
_auth,
getCache,
}) => {
nodes.map(async node => {
let fileNode
// Ensures we are only processing Media Files
// `wordpress__wp_media` is the media file type name for WordPress
if (node.__type === `wordpress__wp_media`) {
try {
fileNode = await createRemoteFileNode({
url: node.source_url,
parentNodeId: node.id,
getCache,
createNode,
createNodeId,
auth: _auth,
})
} catch (e) {
// Ignore
}
}
// For all MarkdownRemark nodes that have a featured image url, call createRemoteFileNode
if (
node.internal.type === "MarkdownRemark" &&
node.frontmatter.featuredImgUrl !== null
) {
const fileNode = await createRemoteFileNode({
url: node.frontmatter.featuredImgUrl, // string that points to the URL of the image
parentNodeId: node.id, // id of the parent node of the fileNode you are going to create
createNode, // helper function in gatsby-node to generate the node
createNodeId, // helper function in gatsby-node to generate the node id
getCache,
})

// Adds a field `localFile` to the node
// ___NODE appendix tells Gatsby that this field will link to another node
// if the file was created, extend the node with "localFile"
if (fileNode) {
node.localFile___NODE = fileNode.id
createNodeField({ node, name: "localFile", value: fileNode.id })
}
})
}
}
```

The file node can then be queried using GraphQL. See an example of this in the [gatsby-source-wordpress README](/plugins/gatsby-source-wordpress/#image-processing) where downloaded images are queried using [gatsby-transformer-sharp](/plugins/gatsby-transformer-sharp/) to use in the component [gatsby-image](/plugins/gatsby-image/).

#### Retrieving the remote file name and extension

The helper tries first to retrieve the file name and extension by parsing the url and the path provided (e.g. if the url is `https://example.com/image.jpg`, the extension will be inferred as `.jpg` and the name as `image`). If the url does not contain an extension, we use the [`file-type`](https://www.npmjs.com/package/file-type) package to infer the file type. Finally, the name and the extension _can_ be explicitly passed, like so:
The helper first tries to retrieve the file name and extension by parsing the url and the path provided (e.g. if the url is `https://example.com/image.jpg`, the extension will be inferred as `.jpg` and the name as `image`). If the url does not contain an extension, `createRemoteFileNode` use the [`file-type`](https://www.npmjs.com/package/file-type) package to infer the file type. Finally, the name and the extension _can_ be explicitly passed, like so:

```javascript
createRemoteFileNode({
Expand All @@ -276,25 +259,24 @@ createRemoteFileNode({
createNode,
createNodeId,
// if necessary!
ext: ".jpg",
name: "image",
ext: `.jpg`,
name: `image`,
})
```

### createFileNodeFromBuffer
### `createFileNodeFromBuffer`

When working with data that isn't already stored in a file, such as when querying binary/blob fields from a database, it's helpful to cache that data to the filesystem in order to use it with other transformers that accept files as input.

The `createFileNodeFromBuffer` helper accepts a `Buffer`, caches its contents to disk, and creates a file node that points to it.
The `createFileNodeFromBuffer` helper accepts a `Buffer`, caches its contents to disk, and creates a `File` node that points to it.

The name of the file can be passed to the `createFileNodeFromBuffer` helper. If no name is given, the content hash will be used to determine the name.

#### Example usage
#### Example

The following example is adapted from the source of [`gatsby-source-mysql`](https://github.com/malcolm-kee/gatsby-source-mysql):

```js
// gatsby-node.js
```js:title=gatsby-node.js
const createMySqlNodes = require(`./create-nodes`)

exports.sourceNodes = async ({ actions, createNodeId, getCache }, config) => {
Expand Down

0 comments on commit 16685a6

Please sign in to comment.