[EPIC] Graphql schema refactor #4261

pieh · 2018-02-27T12:25:05Z

Who will own this?

What Area of Responsibility does this fall into? Who will own the work, and who needs to be aware of the work?

Area of Responsibility:

Select the Area of Responsibility most impacted by this Epic

Admin
Cloud
Customer Success
Dashboard
Developer Relations
OSS
Learning
Marketing
Sales

AoR owner @KyleAMathews

Domain owner TBD

Project manager TBD

Tech lead TBD

Contributors TBD

Summary

Make graphql schema generation code more maintainable and easier to add new features like allowing user specified types on fields instead of automatic inferring.

How will this impact Gatsby?

Domains

List the impacted domains here

Components

List the impacted Components here

Goals

What are the top 3 goals you want to accomplish with this epic? All goals should be specific, measurable, actionable, realistic, and timebound.

How will we know this epic is a success?

What changes must we see, or what must be created for us to know the project was a success. How will we know when the project is done? How will we measure success?

User Can Statement

User can...

Metrics to Measure Success

We will see an increase /decrease in...

Additional Description

In a few sentences, describe the current status of the epic, what we know, and what's already been done.

What are the risks to the epic?

In a few sentences, describe what high-level questions we still need to answer about the project. How could this go wrong? What are the trade-offs? Do we need to close a door to go through this one?

What questions do we still need to answer, or what resources do we need?

Is there research to be done? Are there things we don’t know? Are there documents we need access to? Is there contact info we need? Add those questions as bullet points here.

How will we complete the epic?

What are the steps involved in taking this from idea through to reality?

How else could we accomplish the same goal?

Are there other ways to accomplish the goals you listed above? How else could we do the same thing?

--- This is stub epic - need to convert old description to new format

Main issue im trying to solve is that type inferring will not create fields/types for source data that:

has conflicting types (sometimes source plugins have no way of knowing what is correct type and can’t type correct themselves),
has no data for some fields (optional data field/node will not be inferred if that field/node is not used in source of the data and queries will fail because schema doesn’t contain that field)
This is not inferring implementation issue - it’s just this approach simply can’t handle such cases.

My approach to handle that is to allow defining field types by

using graphql schema definition language (for data that will have static types - i.e. File nodes will always have same data structure)
exposing some function for gatsby node api (for data that types are dynamic but we can get information about structure - for example Contentful content model)

Problem:

Current implementation of schema creation looks something like this:

Input/output type creation is not abstracted and implementation has to be duplicated for each source of information.

In my proof of concept ( repository ) I added another source (graphql schema definition language) and just implemented subset of functionality:

As testing ground I used this barebones repository. Things to look for:

schema definition
source data with conflicting types (frontmatter in markdown files)
query using fields with types in schema definition
summary that compares source data to output data

Implementing it way this way is fine for proof of concept but it’s unmaintainable in long term. So I want to introduce common middleman interface:

Goals:

This should be backward compatible
Make it easy to add new source of field types and/or schema features
Remove code duplication
Initial change should only introduce middleman interface and provide feature parity with current implementation to ease review

Questions:

What are potential features / use cases to take into consideration when designing details of this (not features of schema - it’s how it could be used)? I see 1 potential cases where this might be important (to not need to do big refactor again later):
- Live previews - right now gatsby can’t modify schema when it’s running in develop mode but it can refresh data (builtin refresh for filesystem source + __refresh hook to refresh all source data) - it might be worth looking to be able to refresh schema too?
How would schema stitching fit into it (merging external remote graphql endpoints with gatsby graphql layer)? Basic schema stitching would not interact with gatsby graphql part (for example we have our gatsby queries markdown etc and then we have fields from github graphql api repository - if there’s no connection between them then then this would be out of scope for this RFC), but if we would like to add connection - for example allow linking frontmatter field to github repository then this would need to be thought out ahead of time. I was looking at graphql-tools schema stitching and it does have some nice tooling for merging schemas and option to add resolvers between schemas - is this something that was planned to be used for that?

The text was updated successfully, but these errors were encountered:

m4rrc0 · 2018-02-27T16:14:11Z

Thanks a lot for the research @pieh .
Maybe I am completely off road here but couldn't we use schema stitching to add missing fields?
But maybe it could not solve the issue 1 you raise (conflicting types on a field) and it probably is a weaker solution overall on the long run...?
I love your idea about live preview refresh! That would be a super solid feature to add to Gatsby IMHO.
To be honest my main concern is the time such a refactor will take...

pieh · 2018-02-27T19:08:10Z

@MarcCoet
Not sure on what level you would want to stitch schema - this is not magic that would make it work automatically :) . There is multiple "side effects" for single field in data - it produces output type, it produces input type for filtering, it produces input type for sorting, it produces input type for grouping. This would still suffer same problems - it would need to be implemented in multiple places.

There is not much distinction between fields with no data and with conflicting types in terms of creating schema currently - gatsby discards fields with have conflicting types (so they become fields with no data at the stage of creating schema) - the distinction is more for website/apps developers - they have data but field is not in schema.

You can use my proof of concept branch (at least for testing things out) - it has all basic features of getting fields - it can resolve local files, linked nodes (both of single and multiple types - unions) and of course inline fields. But to get full feature set I would have to implement this 3 more times in different places (filtering, sorting, grouping).

Or you now can use setFieldsOnGraphQLNodeType function ( https://www.gatsbyjs.org/docs/node-apis/#setFieldsOnGraphQLNodeType ) to add/overwrite "inline" field types (fields that aren't linked to other nodes). Not really super easy to use and can't reference other types that are available in schema.

I totally get your time concern and frustration about this issue - I have this problem too with some of my sites - it's hard to explain your content editors why things suddenly stopped working when they cleared some optional field ( this is why I started working on this! ), but this has to be done right sooner than later, as features that will need to be refactored will pile up.

KyleAMathews · 2018-02-27T19:18:53Z

How would schema stitching fit into it

It wouldn't — the schema stitching process basically takes two entirely separate schemas and lets you query both of them at the same time. Unless people name their types the same as the default Gatsby ones, there'd be no interaction between the two schemas.

KyleAMathews · 2018-02-27T19:20:10Z

Love the direction you're going here! This feels like the right approach and direction for a refactor and will unlock a lot of really nice capabilities!

pieh · 2018-02-27T19:30:16Z

About schema stitching - I was researching this a bit earlier and graphql-tools provide way to add resolvers between schemas - https://www.apollographql.com/docs/graphql-tools/schema-stitching.html#adding-resolvers as part of their schema stitching toolkit. So hypothetically we could create custom resolver (or rather user on project level or on plugin level would) that could transform:
repository: "https://github.com/gatsbyjs/gatsby" (<- that's frontmatter) into response from repository query from github graphql api (similar to how we link/map to nodes currently). This doesn't have to land in initial version of schema stitching, but is something worth keeping in mind.

KyleAMathews · 2018-02-27T19:46:11Z

Huh! That'd be amazing! Yeah, there's a ton of possibilities here — you could link to tweets, flickr images, facebook profiles, etc. anything accessible via an API and as long as you have the right source plugin installed, everything would be linked up. That'd be crazy powerful.

jlengstorf · 2018-02-27T21:27:44Z

@pieh @KyleAMathews This is something I've got a bit of experience with. When I was at IBM, we needed to keep data sources discrete, but allow them to be combined in queries to avoid complex data processing on the front-end. I ended up creating and open sourcing GrAMPS to address this. (I wrote up a "hello world" example in this post.)

One of the goals of GrAMPS is to allow what I've clumsily dubbed "asynchronous stitching", where a data source can define how it extends another data source if that data source exists in the schema. This would allow plugins to build on each other when possible, but wouldn't require them to be peerDependencies. From an open/shareable standpoint, this seems like a way to have our cake and eat it, too: we optionally upgrade the GraphQL schema, rather than hard-coding fragile relationships.

The logic behind this wouldn't require GrAMPS to function; it's basically checking the schema for a given type before applying the mergeSchemas call.

I'm not sure how well this fits into the overall goal of this RFC, but I think it could help us implement the "Schema Builder" with more flexibility and extendability.

Happy to help out on this however I can. Let me know if you want me to expand on any of this.

pieh · 2018-02-27T23:39:02Z

@jlengstorf Wow, GrAMPS is cool! Not sure if it will fit, but I will definitely read up more on it (either to use it or at least steal some ideas!) I will for sure reach out to you for your insight.

I'd like to keep this RFC to not focus on implementation details (too much 😄). I want this to serve as requirements gathering place so we could later design APIs that could be extended if needed (to not over-engineer initial refactor) but not changed (wishful thinking 😄). I think we could expose same internal APIs to plugins but to do that they need to be well designed and not be subject to breaking changes in near future.

i8ramin · 2018-05-16T18:30:57Z

Hi. Has there been any update on this issue? Just wondering. I really wanna use Contentful + graphql ... but this issue makes it very hard to do so :(

niklasravnsborg · 2018-07-30T11:04:43Z

Just reading into this concept. I wrote a custom source plugin where a field from my JSON API can be null. These fields don't end up in my schema as I would expect. Are there any updates on this?

@pieh Awesome work! Keep it up 😊

calcsam · 2018-08-11T23:47:08Z

@i8ramin -- it's definitely in our backlog!

@pieh -- I've renamed this issue for clarity

KyleAMathews · 2018-09-08T03:50:41Z

Was talking to @sgrove today with @calcsam and he had a really interesting idea which could apply here. Basically it was how to estimate when you've sufficiently inferred a graphql type from data. He said you could assign a "novelty" score to each type you're inferring i.e. how novel do you expect each new item to be. You evaluate sample data item by item. Each time you "learn" something new e.g. a new field, the expected novelty score goes up. Whenever an item matches the existing inferred type, the expected novelty score drops. After evaluating enough new items and not learning anything new, you can quit.

This could speed up processing large data sets as we could pull out random samples and often times (especially on data that has representative data for each field on every object) we could stop the inference process quite a bit sooner than we do now.

KyleAMathews · 2018-09-28T23:11:39Z

@sgrove mentioned this lib too https://github.com/cristianoc/REInfer

freiksenet · 2019-01-28T07:10:18Z

@stefanprobst btw, instead of projection as an object, we should include a GraphQL fragment.

stefanprobst · 2019-01-28T07:29:44Z

@freiksenet Interesting!! I did projection as an object mainly because it connects nicely with what graphql-compose's Resolver class provides for free.

freiksenet · 2019-01-28T07:32:51Z

@stefanprobst right, that makes sense. We can keep it as is for now. Fragment is more flexible cause one can include eg nested fields. I'm borrowing it from here

stefanprobst · 2019-01-28T08:39:33Z

@freiksenet I'm all for allowing fragments to extend the selection set! Nested fields should work with the projection object as well though, e.g. projection: { foo: true, nested: { foo: true } }.

freiksenet · 2019-01-28T09:59:32Z

@stefanprobst Hey, why do you have this line in the code?

const updateSchema = async () => {
  const tc = addInferredType(`SitePage`)
  // this line
  delete tc.gqType._gqcInputTypeComposer
  addResolvers(tc)
  addTypeToRootQuery(tc)
  return schemaComposer.buildSchema({ directives })
}

stefanprobst · 2019-01-28T10:42:38Z

@freiksenet Sorry should have put a comment there. graphql-compose can save a reference to a corresponding InputTypeComposer on this property of the TypeComposer. When updating SitePage we want to create a new InputObjectType, this seemed to be the simplest way without having to mess with the local cache in getFilterInput. Deleting the property should produce a new InputTypeComposer for SitePage when calling getITC() here.

freiksenet · 2019-01-28T12:27:47Z

Shouldn't addInferredType produce a new type composer anyway?

stefanprobst · 2019-01-28T16:52:16Z

In type inference, if a TypeComposer already exists, we use it. This is done not mainly because of schema updating, but a TypeComposer can be created when parsing SDL in the step before. One consequence for schema updating is that we re-use a TypeComposer that has the previously produced InputTypeComposer on an internal property. There certainly are more elegant ways to invalidate or reuse this input type - since we want to get rid of schema updating at some point anyway I didn't put much effort into that though.

freiksenet · 2019-01-29T08:26:10Z

@stefanprobst got it.

I can't find date type anywhere in code, how is it added to the composer?

freiksenet · 2019-01-29T08:26:34Z

Right, never mind, it's build in GraphQL Compose.

stefanprobst · 2019-01-30T08:00:52Z

@freiksenet Sorry it took a bit longer - there is now a branch with everything squashed, merged with master, and with the original tests added at https://github.com/stefanprobst/gatsby/tree/schema-refactor-v2

freiksenet · 2019-01-30T08:16:23Z

@stefanprobs Thank you! That's very helpful.

freiksenet · 2019-01-30T08:34:48Z

@stefanprobst I do'nt quite understand this code:

// would provides the projected fields out of the box.
const addProjectedFields = fields =>
  Object.entries(fields).reduce((acc, [fieldName, fieldConfig]) => {
    const { resolve } = fieldConfig
    acc[fieldName] = {
      ...fieldConfig,
      resolve: (source, args, context, info) => {
        // TODO: We don't need the whole selection set,
        // just the `projected` fields on children.
        const { getProjectionFromAST } = require(`graphql-compose`)
        const projection = getProjectionFromAST(info)
        const { selectionSet } = info.fieldNodes[0]
        extendSelectionSet(selectionSet, projection)
        return resolve(source, args, context, info)
      },
    }
    return acc
  }, {})

You are getting projection from AST then merging that same projection back into selection set. Why?

freiksenet · 2019-01-30T09:27:39Z

Some status update, pinging @ChristopherBiscardi who is blocked on the changes.

I'm porting the code by going file-through-file.
I'm being conservative and I'm removing changes that would result in minor breaking changes.
I'm moving from a dependency on global schemaComposer state and instead passing schemaComposer to functions.
Same goes for "db", that I pass as "nodeStorage", both to schema generation and to graphql context to be used in resolvers.
I've made addTypeDefs and addResolvers to be actions, instead of new api runs.

My one week prediction is probably overly optimistic, but I hope to get something running until the end of the week. Let's see how it goes.

freiksenet · 2019-01-30T09:42:54Z

Also I feel I'd make a similar thing that was done with lokijs and put the new schema thing under a flag. Without the flag, the two new apis won't work, but otherwise it should (hopefully) work identically. I feel many changes that @stefanprobst did to the APIs/internal logic are very good, but I want to first release something that's fully compatible before possibly adding those for 3.0.

stefanprobst · 2019-01-30T09:54:39Z

@freiksenet Thanks for the update! Sounds like a great plan!

You are getting projection from AST then merging that same projection back into selection set. Why?

This is what's happening:
We use the getProjectionFromAST utility from graphql-compose (that's also used internally in its Resolver class) to get all fields in the selection set plus all fields defined in projection properties. This is returned as an object.
We then use this object (that represents all the fields that need to be in the selection set) and extend the selection set accordingly.
As the TODO comment is trying to convey, this is not the most straightforward way: there is no need to use the getProjectionFromAST utility - we should just collect all the projection props ourselves and use those to extend the selection set - I hope to find some time next weekend to improve on this.

stefanprobst · 2019-01-30T09:57:32Z

Also for info: except for a few tests that didn't make sense anymore (I left a comment there) all the old tests are ported and passing - except (i) sorting null values, and (ii) we infer a mix of date objects and strings as string, not as invalid.

gatsbot · 2019-02-20T17:27:34Z

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

stefanprobst · 2019-03-05T12:17:56Z

Everyone: We have an alpha version of our new Schema Customization API ready, and we'd love your feedback! For more info check out this blogpost.

freiksenet · 2019-03-22T09:02:48Z

This is released now in 2.2.x.

sami616 · 2019-03-22T11:24:28Z

Im running into an issue when using the new schema customisation with gatsby-source-contentful.

Defining a type where one of the fields points to a reference array always returns null

Ie:

type ContentfulPage implements Node {
   editorTitle: String
   title: String
   slug: String
   sections: [ContentfulSection] // im always null 
}

stefanprobst · 2019-03-22T21:40:52Z

@sami616 Could you file a bug report for this, and if possible include a test repo to reproduce the error? That would be very helpful, thank you!

sami616 · 2019-03-23T07:00:00Z

@stefanprobst will try to file a bug report today - thanks!

sami616 · 2019-03-23T08:28:29Z

@stefanprobst #12787 👍🏻

pieh mentioned this issue Feb 27, 2018

[Discussion] Feature: GraphQL schema snapshots for all data sources to solve undefined/empty data issues #3344

Closed

m-allanson mentioned this issue Apr 5, 2018

[gatsby-plugin-contentful] GraphQL - allow content-type queries without contents #4847

Closed

dxinteractive mentioned this issue Apr 25, 2018

enty-docs gatsby bug 92green/enty#62

Closed

AdamPflug mentioned this issue May 23, 2018

Gatsby Build fails - GraphQL Error - Modular Content Fields datocms/gatsby-source-datocms#13

Closed

calcsam changed the title ~~[RFC] graphql schema refactor~~ [RFC] Allow user to define GraphQL schemas Aug 11, 2018

calcsam mentioned this issue Aug 11, 2018

Prevent GraphQL from returning everything as nullable #5337

Closed

pieh changed the title ~~[RFC] Allow user to define GraphQL schemas~~ Graphql schema refactor Aug 24, 2018

pieh changed the title ~~Graphql schema refactor~~ [EPIC] Graphql schema refactor Aug 24, 2018

pieh assigned pieh and KyleAMathews and unassigned pieh Aug 24, 2018

pieh added Epic labels Aug 24, 2018

agarrharr mentioned this issue Aug 30, 2018

gatsby-source-contentful production build breaking if content is removed from Contentful field #2881

Closed

pieh mentioned this issue Sep 17, 2018

Extending third-party schemas via directives gatsbyjs/rfcs#11

Open

pieh mentioned this issue Oct 1, 2018

[gatsby-source-wordpress] Get (flexible content) fields for multiple post types #8615

Closed

ErisDS mentioned this issue Oct 5, 2018

Unexpected null / undefined for optional fields (Gatsby Bug) TryGhost/gatsby-source-ghost#2

Closed

gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Feb 20, 2019

wardpeet added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Feb 20, 2019

freiksenet closed this as completed Mar 22, 2019

LekoArts removed the not stale label May 26, 2020

[EPIC] Graphql schema refactor #4261

[EPIC] Graphql schema refactor #4261

Comments

pieh commented Feb 27, 2018 • edited by t6tg Loading

Who will own this?

Summary

How will this impact Gatsby?

Goals

How will we know this epic is a success?

Additional Description

What are the risks to the epic?

What questions do we still need to answer, or what resources do we need?

How will we complete the epic?

How else could we accomplish the same goal?

Problem:

Goals:

Questions:

m4rrc0 commented Feb 27, 2018

pieh commented Feb 27, 2018

KyleAMathews commented Feb 27, 2018

KyleAMathews commented Feb 27, 2018

pieh commented Feb 27, 2018

KyleAMathews commented Feb 27, 2018

jlengstorf commented Feb 27, 2018

pieh commented Feb 27, 2018

i8ramin commented May 16, 2018

niklasravnsborg commented Jul 30, 2018

calcsam commented Aug 11, 2018

KyleAMathews commented Sep 8, 2018

KyleAMathews commented Sep 28, 2018

freiksenet commented Jan 28, 2019

stefanprobst commented Jan 28, 2019

freiksenet commented Jan 28, 2019

stefanprobst commented Jan 28, 2019

freiksenet commented Jan 28, 2019

stefanprobst commented Jan 28, 2019

freiksenet commented Jan 28, 2019

stefanprobst commented Jan 28, 2019

freiksenet commented Jan 29, 2019

freiksenet commented Jan 29, 2019

stefanprobst commented Jan 30, 2019

freiksenet commented Jan 30, 2019

freiksenet commented Jan 30, 2019 • edited Loading

freiksenet commented Jan 30, 2019 • edited Loading

freiksenet commented Jan 30, 2019

stefanprobst commented Jan 30, 2019

stefanprobst commented Jan 30, 2019

gatsbot bot commented Feb 20, 2019

stefanprobst commented Mar 5, 2019

freiksenet commented Mar 22, 2019

sami616 commented Mar 22, 2019

stefanprobst commented Mar 22, 2019

sami616 commented Mar 23, 2019

sami616 commented Mar 23, 2019

pieh commented Feb 27, 2018 •

edited by t6tg

Loading

freiksenet commented Jan 30, 2019 •

edited

Loading

freiksenet commented Jan 30, 2019 •

edited

Loading