-
Notifications
You must be signed in to change notification settings - Fork 10.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
1.0 new GraphQL data layer #420
Comments
Wow, this is super exciting! But just to clarify, these GraphQL queries would only be running at bootstrap time, right? Or would there also be a GraphQL client in the app bundle? |
@SachaG yup! The queries are run during bootstrap (or when the query or source data changes). The results of the queries are written out as JSON files. I'll be writing this up as part of describing idea for code/data splitting but basically you'll get a big directory of JSON files. There's then a With this system, to Webpack and your code, as far as they know it's just a normal webpack module so no need for special handling or a graphql client. |
Makes sense, thanks for the details! |
This is really great stuff! I was recently looking into gatsby for a couple projects, but decided to build my own solution because the data model wasn't flexible enough - essentially I needed exactly what you've described above - a filesystem as a database. The project is called catalyst. It also uses React to render views. Since gatsby is a much more mature project, and the 1.0 roadmap seems to be really fantastic, I wanted to share the catalyst data model with you as feedback, and at minimum, as a source plugin for gatsby. The GraphQL stuff is also exactly where I was thinking of going as well; I'm currently in the process of separating the filesystem stuff into a separate project called fsdb, and implementing the GraphQL compatibility layer. High level summary of fsdb
Details All files are nodes, with data and content The line between data and content is really thin, especially when using fsdb to build websites for artists and other, non-blog projects. So with that in mind, the data model is not tied to markdown documents, and treats key-value stores as first-class citizens. When loaded and transformed into memory, fsdb combines the multiple declarations into one atomic data object. Content cannot be merged, only data properties. Folders can be nodes too, with data and content By either using an There's inheritance built in, for common properties amongst siblings By declaring a (configurable) There's references built in, but with GraphQL, this shouldn't be necessary GraphQL is definitely the superior option here, but I had set it up so that the parser would look for a special string sequence and reference the required file to reduce duplication Example data/authors/muju.yaml title: Muju data/books/common.md ---
type: book
--- data/books/shasekishu/index.yaml title: Shasekishū
author: "*/authors/muju"
published: 1283 data/books/shasekishu/common.yaml type: koan data/books/shasekishu/a-cup-of-tea.md ---
title: A Cup of Tea
---
Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.
Eshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.
Eshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: "If you really love me so much, come and embrace me now." In memory {
"authors": {
slug: "authors",
path: [],
parent: undefined,
children: {
"muju": {
slug: "muju",
sources: [
"data/authors/muju.yaml"
],
path: [ "authors" ],
data: {
title: "Muju"
},
parent: { /* authors */ },
children: {}
}
}
},
"books": {
slug: "books",
path: [],
parent: undefined,
children: {
"shasekishu": {
slug: "shasekishu",
sources: [
"data/books/common.md",
"data/books/shasekishu/index.yaml"
],
path: [ "books" ],
data: {
title: "Shasekishū",
author: { /* authors.children.muju */ },
type: "book",
published: 1283
},
parent: { /* books */ },
children: {
"a-cup-of-tea": {
slug: "a-cup-of-tea",
sources: [
"data/books/shasekishu/common.yaml",
"data/books/shasekishu/a-cup-of-tea.md"
],
path: [ "books", "shasekishu" ],
data: {
title: "A Cup of Tea",
type: "koan"
},
contentRaw: "Twenty monks and one nun, who was named Eshun, were practicing meditation with a certain Zen master.\n\nEshun was very pretty even though her head was shaved and her dress plain. Several monks secretly fell in love with her. One of them wrote her a love letter, insisting upon a private meeting.\n\nEshun did not reply. The following day the master gave a lecture to the group, and when it was over, Eshun arose. Addressing the one who had written to her, she said: \"If you really love me so much, come and embrace me now.\"",
contentFormat: "markdown",
parent: { /* books.children.shasekishu */ },
children: {}
}
}
}
}
}
} The data is outputted as a tree and as a flat hash like so: {
"authors": {},
"authors/muju": {},
"books": {},
"books/shasekishu": {},
"books/shasekishu/a-cup-of-tea": {}
} Since we're always referencing objects, it's really easy to move around from one node to another. Queries // with parent prototypical inheritance and common data files enabled!
books.children["shasekishu"].children["a-cup-of-tea"].published === 1283
books.children["shasekishu"].children["a-cup-of-tea"].author.data.title === "Muju" Thoughts If you're interested, I'd love to get your, and the Gatsby.js community's thoughts on using this. I'd also be happy to finalize the GraphQL layer for use in the 1.0 release! |
@alizain oh very cool! Good to see we're thinking along the same lines. My plan right now is that there'll be a very thin contract between source plugins and Gatsby. Basically the source plugin will give Gatsby GraphQL types to add to the schema and then Gatsby in turn will ask the source plugin to resolve queries as needed. I'd been using Relay in a product so am building my source plugins with https://github.com/graphql/graphql-relay-js which has some handy helpers plus good ideas. But I'd really love to see other ideas explored and your idea of auto-linking folders and files w/ some conventions is really interesting and would lend itself nicely to GraphQL/Gatsby. Basically this stuff is super duper brand new so yes, please explore and build a source plugin or three (once the plugin system is released — hopefully the next alpha) and we'll all learn together what works. And also you could have multiple src plugins over the same data which would let you query the data in multiple ways depending on your use case. |
This sounds great, let me know if I can help 😄 |
Awesome! You'll be super helpful as we work out the APIs needed for the data layer. I'll post here once the plugin system plus a handful of source plugins are released so you (and others) can test and try building your own. Super excited to see all the directions this can go. |
Some updates. A basic version of GraphQL data layer has been implemented and I'm feeling really happy with it. GraphQL makes it very easy to specify in each component the exact component's data requirements. This ensures that we're shipping only the bits to the browser that are necessary. On my blog for example, the vast majority of page data bundles are < 5kb. A few things that are in progress. Data transformation expressed through GraphQL is something I'm really excited about. I spoke on this recently at the GraphQLSummit (video here: https://www.youtube.com/watch?v=y588qNiCZZo, slides here: https://graphql-gatsby-slides.netlify.com). I built a simple image gallery using Gatsby 1.0 and some experimental image manipulation graphql types (not shipped yet). You can see the code here: https://github.com/gatsbyjs/gatsby/tree/14b0320379dee196a182ce8f6d3db5087fb419b2/examples/image-gallery Really happy with how expressive it is. The index page of the gallery including the react component and graphql query is all of 54 lines of code. This is what the query looks like:
What's really fun is that queries hot reload so you can modify the image sizes for example and see changes almost immediately. I'm also R&Ding the best way to dynamically build a GraphQL schema from files. It'd be a very poor user experience if every Gatsby user had to manually create their own GraphQL schema. I've always been very impressed when I use Elasticsearch as you can just send data to them and they generally do a very good job of inferring your data types for you so the db immediately feels useful. I'd like that same experience with Gatsby & GraphQL. You point Gatsby at a bunch of files and you should be shocked by how much Gatsby knows already about your content. But at the same time — similar to Elasticsearch — users should retain full ability to control the schema as they wish. What I've been stuck on for the past while is deciding on the data structure to represent files and the various ways they can be parsed and extended e.g. a markdown file has various file-level attributes then the file is parsed into markdown which has various parts including the frontmatter which is parsed into a json object then one of those fields could point to a file which happens to be an image. The data structure would need to represent this while allowing Gatsby plugins to modify and extend the data structure in arbitrary ways while also supporting being able to eventually convert the structure into a GraphQL schema. After going back and forth on a number of ways of representing this, it occurred to me that what I was doing was very similar to compiler. Take a compiler/transpiler like Babel. Babel takes a javascript file, parses it into an Abstract Syntax Tree (AST), allows plugins to modify the tree in various ways, and finally generates the final resulting JS file. We could do the same thing for Gatsby and GraphQL. We "parse" the files for a site into an AST, allow plugins to extend or modify the tree, and then finally use this to generate the GraphQL schema. A bit odd perhaps but I think it'll work :-) There's an excellent generic library that I think will work for this https://github.com/wooorm/unist It's the basis for the excellent Markdown parser http://remark.js.org/ I'll be building a prototype on this idea next week so more then. |
Exciting stuff! I'm coincidentally also working a lot with GraphQL these days (porting http://telescopeapp.org to Apollo), it's nice to see two of my favorite open-source projects converge :) |
Cool! So some background on the things I’m doing. I’m doing it lot’s of little projects so you can pick and choose what you do or don’t want. unist is the “node” format, describing that objects have a mdast, hast, nlcst are “namespaces” of unist, respectively for markdown, HTML, and natural language. vfile is a very small virtual file format, focussing on storing messages (linting is a big part of the ecosystem). vfile’s can be used for binary data too. unified is a middleware stack for processing (parse/transform/compile) syntax trees through plugins. There’s parse plugins (read markdown to syntax tree), transform plugins (add a table of contents), and stringify plugins (write markdown to man pages). remark, rehype, retext are unified processors which come with a parser/compiler plugin packaged. The ecosystem consists of utilities and plugins. The former works with unist/mdast/hast/nlcst nodes, are prefixed with The plugins, prefixed with their processor name, often do bigger things: The essence, or the future, kinda looks like “Gulp for syntax tree transformations”: var unified = require('unified');
var markdown = require('remark-parse');
var toc = require('remark-toc');
var remark2rehype = require('remark-rehype');
var document = require('rehype-document');
var minify = require('rehype-preset-minify');
var html = require('rehype-stringify');
process.stdin
.pipe(unified())
.use(markdown)
.use(toc)
.use(remark2rehype)
.use(document)
.use(minify)
.use(html)
.pipe(process.stdout); In the example above we take stdin, read it as markdown, add a table of contents, transform it to an HTML syntax tree, wrap it in a valid document (doctype, etc.), minify, compile as HTML, and write to stdout.
I’m wondering, what languages do you have in mind to connect to Gatsby? How would binary files work? 👋 |
Thanks for the tour! I didn't know about all these things so very helpful to get the big picture view. So what I'm proposing using Unist for with Gatsby is a bit different. Instead of parsing a "file" from one format to another e.g. Markdown to HTML Gatsby will parse "file directories" and compile them to a GraphQL schema. The focus will be on the file metadata e.g. that a file is a markdown file is the important point not what's in the file because this means we should add to our GraphQL schema support for querying for markdown. The intention of the parsing phase is to explore the latent possibilities within the files. Parsing plugins can add support for Markdown, Asciidoctor, images, PDFs, CSVs, YAML, etc. These "possibilities", now expressed within Unist, will then be compiled to a GraphQL schema against which someone can write queries against to actually perform various file transformations, etc. For example, a markdown file could be discovered to have frontmatter which is transformed into a JSON structure which one of its fields is discovered to point to another file, an image. Once this is compiled to a GraphQL schema, you could write a query against the schema to get a url to the image which has been transformed to 1000px wide.
All this would be discovered automatically during the parse step without any needed intervention by the user. Why I think Unist is a perfect fit is a) the tree data structure of connected nodes fits nicely and b) the Unist utilities will really simplify compiling the AST into a GraphQL schema e.g. to create a GraphQL type that let's you query against only markdown will be trivial with https://github.com/eush77/unist-util-select Make sense? |
Very cool! I like having Unist used this new way. Let me know if I can help answer questions or provide more background! |
@wooorm cool! My initial prototyping is looking very promising :-) will definitely have some questions about the right way to do things. Thanks! |
Alphas should be considered experimental. So not stable. The plugin system hasn't landed yet which will change how much of the core Gatsby code is arranged hence how your site is structured. Also everything is undocumented so I wouldn't use them yet unless you feel like reading a lot of code. |
Thank you for your efforts Kyle. Gatsby looks very interesting. I have a question, the answer for which I didn't find or perhaps missed. Is it possible, after build to update pages contents? Say I create admin page and manage all the site's content, forcing other pages that were updated to rebuild? A GatsbyJs CMS of kind... Thanks in advance. |
@intermundos great question! But it deserves its own issue — could you click the green new issues button at top and post your question there? |
Hi @KyleAMathews love the work here marrying Gatsby with GraphQL! Do you know when/if there will be support to make GraphQL requests with something like Apollo to an external GraphQL API to fetch data? |
Do you want the externally fetched data to be dynamically fetched or at
fetched at build time?
…On Fri, Jan 6, 2017, 4:27 PM Vincent Ning ***@***.***> wrote:
Hi @KyleAMathews <https://github.com/KyleAMathews> love the work here
marrying Gatsby with GraphQL! Do you know when/if there will be support to
make GraphQL requests with something like Apollo to an external GraphQL API
to fetch data?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEVhxNS1QWjHaBGcU5-gOKs0YdNIwscks5rPtwFgaJpZM4J0Ney>
.
|
Ideally dynamically fetched. I'm trying to build a site with a couple forms, so being able to run mutations and re-render components dynamically would be awesome! |
Well go ahead and add it :-) Gatsby is just react. You can do anything you
want on the client.
…On Fri, Jan 6, 2017, 4:51 PM Vincent Ning ***@***.***> wrote:
Ideally dynamically fetched. I'm trying to build a site with a couple
forms, so being able to run mutations and re-render them dynamically would
be awesome!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#420 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAEVh1HSASP3AT7wBRcFw8Q3wwPw9ZLQks5rPuGNgaJpZM4J0Ney>
.
|
Got it, that makes a lot of sense! Thanks 👍 |
Hi kayle, calling api handler in D:/alpha13 for api createPages |
Is it possible to run a standalone graphql server? So as to have a static site with a search box. I had a look at the develop script but it gets a bit hazy in the bootstrap 😛 |
Shipped in v1! |
@KyleAMathews I have a question about the date format in graphQL, for my blog i need to put the months in french. How can I change this parameter ? |
Pull data into components instead of pushing
Data in Gatsby currently is pushed into templates to be rendered into HTML (like pretty much every static site generator). This is a simple pattern and works great for many use cases. But when you start working on more complex sites, you really start to miss the flexibility of building a database-driven site. With a database, all your data is available to query against in any fashion that you'd like. Whatever bits of data you need to assemble a page, you can pull in. You want to create author pages showing their bio & last 5 posts? It's just a query away. I want this same flexibility for Gatsby. I want to be able to query my markdown (or picture or data, etc) files and treat them as a database of sorts.
This is especially important for Gatsby as unlike traditional static-site-generators, all data used to build a page is loaded into the client. Currently Gatsby loads all data for the site into the client. This is both wasteful (your site doesn't use all that data) as well as costly. Time-to-interactivity is an important web performance metric. The larger your javascript bundle, the longer it takes to download and evaluate the Javascript. This is especially noticeable on low-end phones on poor networks.
With this change in Gatsby 1.0 both code and data will be split on a per-route basis. When a user visits a page, they will load just the javascript & data it needs and then lazy-load more once the first page is initialized.
Now a site can easily have "heavy" pages (in terms of data and/or code) without affecting other parts of the site. E.g. a search page or a page with data visualizations.
New GraphQL data layer
Gatsby uses Webpack right now for everything. Javascript, CSS, images, Markdown, JSON, YAML, etc. are all handled using Webpack's rather brilliant system of treating everything as JS modules.
Using Webpack has worked out really really well for Gatsby. It gives us a ton out of the box. A lovely hot-reloading development experience. Easy interoperability with all the latest and greatest web tools. And fast, optimized production builds. It's truly a swiss-army knife of tools.
But Webpack has some problems with data.
First it only understands files. If you want to integrate data from any other source e.g. external APIs you have to first convert that data into files.
Webpack can get weird if you try to reference files from outside of the webroot. I've been bitten by this several times as have others.
Another big problem is you can't use just some data from a file. What if you wanted to use data in your site from a 1 gigabyte CSV file? There's no way to get around loading the entire file unless again you first preprocess the file.
The last problem is data splitting. Ideally each route can load only the data it needs. But how? Often a route will want a bit of data from a number of files or other data sources. How can a route both easily specify what data it needs as well as tell Webpack to package that minimal data set together to be shipped to the browser to power the react component(s) for that route.
I've thought through a number of different possibilities (this issue explores one of those) but could never quite figure out how to make Webpack do what I wanted it to.
So eventually I concluded the simplest thing would be to split the data layer off and remove it from Webpack's control. Let Webpack do what it does best and build a data system tailor-made for Gatsby's needs.
I've been prototyping this new data layer the past few weeks with GraphQL and am really really pleased with how well it's working.
How it'll work
When you setup a site, you'll add one to many source plugins. These source plugins can be file-based e.g. a markdown source plugin which you point at a directory of markdown files or network-based e.g. for consuming an internal API or a 3rd-party API like Github.
Each source plugin defines types which get composed together to form a schema for your site.
This combined schema is consumed by GraphQL and made available to query against.
That's fairly straightforward. What was tricky though was figuring out how to integrate the new data layer with React components. The pattern which I eventually settled on for my initial prototype is pleasingly simple.
All routes are powered by React.js components. A route component can either power one path e.g.
about.js
or can power many paths e.g. for all blog postsblog-post.js
. Route components need data. To get data, they can export a GraphQL query. This query is run during bootstrap and the result is written out as a JSON file which is inserted into the route component as props. During development the "query runner" watches both route components and source files for changes and re-runs queries overwriting the JSON files which then Webpack hot-reloads.So a very minimal example. Say you have a blog and you want to create an index page listing your blog posts. In your
/pages
directory you'd create anindex.js
which would look something like:You can now think of the various content/data files you have as a "database" to query against however you want. E.g. to create a page listing tags you could export this query.
I created a page like this on my blog (which is running Gatsby-1.0-alpha1) https://www.bricolage.io/tags/
Stuff like pagination, tag pages, and other "meta" pages are now pretty straightforward.
Going with GraphQL also gives us access to fantastic tooling. Facebook uses GraphQL heavily and one of the most useful internal GraphQL tools they've released is Graph_i_QL. An IDE for GraphQL.
Here's a gif of me exploring my blog's GraphQL schema.
I'm super duper excited about all the possibilities the new GraphQL layer opens up. Here's a sampling of some ideas I've had.
PropType or Flow information from your React components queryable.
Create a living styleguide.
writing code documentation while the documentation hot-reloads your
changes.
and pass a
width
value as an argument and have the image source pluginresize the image on the fly. Pass a format string to a date field and
get back a formatted date (no more loading moment.js into the client).
field that's of a minimum length.
author
field in the frontmatter of a markdown file can be connected todata from an
authors.yaml
file which let's you write queries like:rendering.
and rebuild an old site on Gatsby while still maintaining content in
Wordpress.
search, glob, regex, groupBy, sum, etc. data.
With the coming source plugin architecture, getting data into your site will soon be straightforward. Identify the sources of data, compose source plugins, play in Graph_i_QL to create queries, drop queries in route components, write components.
The text was updated successfully, but these errors were encountered: