Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(gatsby-source-graphql): Query batching #22347

Merged
merged 16 commits into from
Mar 25, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions packages/gatsby-source-graphql/.gitignore
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
/*.js
/batching/*.js
yarn.lock
157 changes: 157 additions & 0 deletions packages/gatsby-source-graphql/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -176,5 +176,162 @@ module.exports = {
}
```

# Performance tuning

By default, `gatsby-source-graphql` executes each query in a separate network request.
But the plugin also supports query batching to improve query performance.

**Caveat**: Batching is only possible for queries starting at approximately the same time. In other words
it is bounded by the number of parallel GraphQL queries executed by Gatsby (by default it is **4**).

Fortunately, we can increase the number of queries executed in parallel by setting the [environment variable](https://gatsby.dev/env-vars)
`GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY` to a higher value and setting the `batch` option of the plugin
to `true`.

Example:

```shell
cross-env GATSBY_EXPERIMENTAL_QUERY_CONCURRENCY=20 gatsby develop
```

With plugin config:

```js
const fs = require("fs")
const { buildSchema, buildClientSchema } = require("graphql")

module.exports = {
plugins: [
{
resolve: "gatsby-source-graphql",
options: {
typeName: "SWAPI",
fieldName: "swapi",
url: "https://api.graphcms.com/simple/v1/swapi",
batch: true,
},
},
],
}
```

By default, the plugin batches up to 5 queries. You can override this by passing
`dataLoaderOptions` and set a `maxBatchSize`:

```js
const fs = require("fs")
const { buildSchema, buildClientSchema } = require("graphql")

module.exports = {
plugins: [
{
resolve: "gatsby-source-graphql",
options: {
typeName: "SWAPI",
fieldName: "swapi",
url: "https://api.graphcms.com/simple/v1/swapi",
batch: true,
// See https://github.com/graphql/dataloader#new-dataloaderbatchloadfn--options
// for a full list of DataLoader options
dataLoaderOptions: {
maxBatchSize: 10,
},
},
},
],
}
```

Having 20 parallel queries with 5 queries per batch means we are still running 4 batches
in parallel.

Each project is unique so try tuning those two variables and see what works best for you.
We've seen up to 5-10x speed-up for some setups.

### How batching works

Under the hood `gatsby-source-graphql` uses [DataLoader](https://github.com/graphql/dataloader)
for query batching. It merges all queries from a batch to a single query that gets sent to the
server in a single network request.

Consider the following example where both of these queries are run:

```js
vladar marked this conversation as resolved.
Show resolved Hide resolved
{
query: `query(id: Int!) {
node(id: $id) {
foo
}
}`,
variables: { id: 1 },
}
```

```js
vladar marked this conversation as resolved.
Show resolved Hide resolved
{
query: `query(id: Int!) {
node(id: $id) {
bar
}
}`,
variables: { id: 2 },
}
```

They will be merged into a single query:

```js
vladar marked this conversation as resolved.
Show resolved Hide resolved
{
query: `
query(
$gatsby0_id: Int!
$gatsby1_id: Int!
) {
gatsby0_node: node(id: $gatsby0_id) {
foo
}
gatsby1_node: node(id: $gatsby1_id) {
bar
}
}
`,
variables: {
gatsby0_id: 1,
gatsby1_id: 2,
}
}
```

Then `gatsby-source-graphql` splits the result of this single query into multiple results
and delivers it back to Gatsby as if it executed multiple queries:

```js
vladar marked this conversation as resolved.
Show resolved Hide resolved
{
data: {
gatsby0_node: { foo: `foo` },
gatsby1_node: { bar: `bar` },
},
}
```

is transformed back to:

```js
vladar marked this conversation as resolved.
Show resolved Hide resolved
[
{ data { node: { foo: `foo` } } },
{ data { node: { bar: `bar` } } },
]
```

Note that if any query result contains errors the whole batch will fail.

### Apollo-style batching

If your server supports apollo-style query batching you can also try
[HttpLinkDataLoader](https://github.com/prisma-labs/http-link-dataloader).
Pass it to the `gatsby-source-graphql` plugin via the `createLink` option.

This strategy is usually slower than query merging but provides better error reporting.

[dotenv]: https://github.com/motdotla/dotenv
[envvars]: https://gatsby.dev/env-vars
2 changes: 2 additions & 0 deletions packages/gatsby-source-graphql/package.json
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@
"@babel/runtime": "^7.8.7",
"apollo-link": "1.2.13",
"apollo-link-http": "^1.5.16",
"dataloader": "^2.0.0",
"graphql": "^14.6.0",
"graphql-tools-fork": "^8.9.6",
"invariant": "^2.2.4",
"node-fetch": "^1.7.3",
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
import { parse } from "graphql"
import { execute } from "apollo-link"
import { createDataloaderLink } from "../dataloader-link"

const sampleQuery = parse(`{ foo }`)
const expectedSampleQueryResult = { data: { foo: `bar` } }

// eslint-disable-next-line @typescript-eslint/camelcase
const fetchResult = { data: { gatsby0_foo: `bar` } }

const makeFetch = (expectedResult: any = fetchResult): jest.Mock<any> =>
jest.fn(() =>
Promise.resolve({
json: () => Promise.resolve(expectedResult),
})
)

describe(`createDataloaderLink`, () => {
it(`works with minimal set of options`, done => {
const link = createDataloaderLink({
uri: `some-endpoint`,
fetch: makeFetch(),
})
const observable = execute(link, { query: sampleQuery })
observable.subscribe({
next: (result: any) => {
expect(result).toEqual(expectedSampleQueryResult)
done()
},
error: done,
})
})

it(`reports fetch errors`, done => {
const link = createDataloaderLink({
uri: `some-endpoint`,
fetch: jest.fn(() => Promise.reject(`FetchError`)),
})
const observable = execute(link, { query: sampleQuery })
observable.subscribe({
error: error => {
expect(error).toEqual(`FetchError`)
done()
},
complete: () => {
done.fail(`Expected error not thrown`)
},
})
})

it(`reports graphql errors`, done => {
const result = {
errors: [{ message: `Error1` }, { message: `Error2`, path: [`foo`] }],
}
const link = createDataloaderLink({
uri: `some-endpoint`,
fetch: makeFetch(result),
})
const observable = execute(link, { query: sampleQuery })
observable.subscribe({
error: error => {
expect(error.name).toEqual(`GraphQLError`)
expect(error.message).toEqual(
`Failed to load query batch:\nError1\nError2 (path: ["foo"])`
)
expect(error.originalResult).toEqual(result)
done()
},
complete: () => {
done.fail(`Expected error not thrown`)
},
})
})

it(`supports custom fetch options`, done => {
const fetch = makeFetch()
const fetchOptions = {
credentials: `include`,
mode: `cors`,
}
const link = createDataloaderLink({
uri: `some-endpoint`,
fetch,
fetchOptions,
})

const observable = execute(link, { query: sampleQuery })
const next = jest.fn()

observable.subscribe({
next,
error: done,
complete: () => {
expect(fetch.mock.calls.length).toEqual(1)
const [uri, options] = fetch.mock.calls[0]
expect(uri).toEqual(`some-endpoint`)
expect(options).toEqual(expect.objectContaining(fetchOptions))
done()
},
})
})
})
Loading