Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[EPIC] Graphql schema refactor #4261

Closed
9 tasks
pieh opened this issue Feb 27, 2018 · 64 comments
Closed
9 tasks

[EPIC] Graphql schema refactor #4261

pieh opened this issue Feb 27, 2018 · 64 comments
Assignees

Comments

@pieh
Copy link
Contributor

pieh commented Feb 27, 2018

Who will own this?

What Area of Responsibility does this fall into? Who will own the work, and who needs to be aware of the work?

Area of Responsibility:

Select the Area of Responsibility most impacted by this Epic

  • Admin

  • Cloud

  • Customer Success

  • Dashboard

  • Developer Relations

  • OSS

  • Learning

  • Marketing

  • Sales

    AoR owner @KyleAMathews
    Domain owner TBD
    Project manager TBD
    Tech lead TBD
    Contributors TBD

Summary

Make graphql schema generation code more maintainable and easier to add new features like allowing user specified types on fields instead of automatic inferring.

How will this impact Gatsby?

Domains

List the impacted domains here

Components

List the impacted Components here

Goals

What are the top 3 goals you want to accomplish with this epic? All goals should be specific, measurable, actionable, realistic, and timebound.

How will we know this epic is a success?

What changes must we see, or what must be created for us to know the project was a success. How will we know when the project is done? How will we measure success?

User Can Statement

  • User can...

Metrics to Measure Success

  • We will see an increase /decrease in...

Additional Description

In a few sentences, describe the current status of the epic, what we know, and what's already been done.

What are the risks to the epic?

In a few sentences, describe what high-level questions we still need to answer about the project. How could this go wrong? What are the trade-offs? Do we need to close a door to go through this one?

What questions do we still need to answer, or what resources do we need?

Is there research to be done? Are there things we don’t know? Are there documents we need access to? Is there contact info we need? Add those questions as bullet points here.

How will we complete the epic?

What are the steps involved in taking this from idea through to reality?

How else could we accomplish the same goal?

Are there other ways to accomplish the goals you listed above? How else could we do the same thing?

--- This is stub epic - need to convert old description to new format

Main issue im trying to solve is that type inferring will not create fields/types for source data that:

  • has conflicting types (sometimes source plugins have no way of knowing what is correct type and can’t type correct themselves),
  • has no data for some fields (optional data field/node will not be inferred if that field/node is not used in source of the data and queries will fail because schema doesn’t contain that field)
    This is not inferring implementation issue - it’s just this approach simply can’t handle such cases.

My approach to handle that is to allow defining field types by

  • using graphql schema definition language (for data that will have static types - i.e. File nodes will always have same data structure)
  • exposing some function for gatsby node api (for data that types are dynamic but we can get information about structure - for example Contentful content model)

Problem:

Current implementation of schema creation looks something like this:
current

Input/output type creation is not abstracted and implementation has to be duplicated for each source of information.

In my proof of concept ( repository ) I added another source (graphql schema definition language) and just implemented subset of functionality:
poc

As testing ground I used this barebones repository. Things to look for:

Implementing it way this way is fine for proof of concept but it’s unmaintainable in long term. So I want to introduce common middleman interface:
proposed

Goals:

  • This should be backward compatible
  • Make it easy to add new source of field types and/or schema features
  • Remove code duplication
  • Initial change should only introduce middleman interface and provide feature parity with current implementation to ease review

Questions:

  1. What are potential features / use cases to take into consideration when designing details of this (not features of schema - it’s how it could be used)? I see 1 potential cases where this might be important (to not need to do big refactor again later):
    • Live previews - right now gatsby can’t modify schema when it’s running in develop mode but it can refresh data (builtin refresh for filesystem source + __refresh hook to refresh all source data) - it might be worth looking to be able to refresh schema too?
  2. How would schema stitching fit into it (merging external remote graphql endpoints with gatsby graphql layer)? Basic schema stitching would not interact with gatsby graphql part (for example we have our gatsby queries markdown etc and then we have fields from github graphql api repository - if there’s no connection between them then then this would be out of scope for this RFC), but if we would like to add connection - for example allow linking frontmatter field to github repository then this would need to be thought out ahead of time. I was looking at graphql-tools schema stitching and it does have some nice tooling for merging schemas and option to add resolvers between schemas - is this something that was planned to be used for that?
@m4rrc0
Copy link
Contributor

m4rrc0 commented Feb 27, 2018

Thanks a lot for the research @pieh .
Maybe I am completely off road here but couldn't we use schema stitching to add missing fields?
But maybe it could not solve the issue 1 you raise (conflicting types on a field) and it probably is a weaker solution overall on the long run...?
I love your idea about live preview refresh! That would be a super solid feature to add to Gatsby IMHO.
To be honest my main concern is the time such a refactor will take...

@pieh
Copy link
Contributor Author

pieh commented Feb 27, 2018

@MarcCoet
Not sure on what level you would want to stitch schema - this is not magic that would make it work automatically :) . There is multiple "side effects" for single field in data - it produces output type, it produces input type for filtering, it produces input type for sorting, it produces input type for grouping. This would still suffer same problems - it would need to be implemented in multiple places.

There is not much distinction between fields with no data and with conflicting types in terms of creating schema currently - gatsby discards fields with have conflicting types (so they become fields with no data at the stage of creating schema) - the distinction is more for website/apps developers - they have data but field is not in schema.

You can use my proof of concept branch (at least for testing things out) - it has all basic features of getting fields - it can resolve local files, linked nodes (both of single and multiple types - unions) and of course inline fields. But to get full feature set I would have to implement this 3 more times in different places (filtering, sorting, grouping).

Or you now can use setFieldsOnGraphQLNodeType function ( https://www.gatsbyjs.org/docs/node-apis/#setFieldsOnGraphQLNodeType ) to add/overwrite "inline" field types (fields that aren't linked to other nodes). Not really super easy to use and can't reference other types that are available in schema.

I totally get your time concern and frustration about this issue - I have this problem too with some of my sites - it's hard to explain your content editors why things suddenly stopped working when they cleared some optional field ( this is why I started working on this! ), but this has to be done right sooner than later, as features that will need to be refactored will pile up.

@KyleAMathews
Copy link
Contributor

How would schema stitching fit into it

It wouldn't — the schema stitching process basically takes two entirely separate schemas and lets you query both of them at the same time. Unless people name their types the same as the default Gatsby ones, there'd be no interaction between the two schemas.

@KyleAMathews
Copy link
Contributor

Love the direction you're going here! This feels like the right approach and direction for a refactor and will unlock a lot of really nice capabilities!

@pieh
Copy link
Contributor Author

pieh commented Feb 27, 2018

About schema stitching - I was researching this a bit earlier and graphql-tools provide way to add resolvers between schemas - https://www.apollographql.com/docs/graphql-tools/schema-stitching.html#adding-resolvers as part of their schema stitching toolkit. So hypothetically we could create custom resolver (or rather user on project level or on plugin level would) that could transform:
repository: "https://github.com/gatsbyjs/gatsby" (<- that's frontmatter) into response from repository query from github graphql api (similar to how we link/map to nodes currently). This doesn't have to land in initial version of schema stitching, but is something worth keeping in mind.

@KyleAMathews
Copy link
Contributor

Huh! That'd be amazing! Yeah, there's a ton of possibilities here — you could link to tweets, flickr images, facebook profiles, etc. anything accessible via an API and as long as you have the right source plugin installed, everything would be linked up. That'd be crazy powerful.

@jlengstorf
Copy link
Contributor

@pieh @KyleAMathews This is something I've got a bit of experience with. When I was at IBM, we needed to keep data sources discrete, but allow them to be combined in queries to avoid complex data processing on the front-end. I ended up creating and open sourcing GrAMPS to address this. (I wrote up a "hello world" example in this post.)

One of the goals of GrAMPS is to allow what I've clumsily dubbed "asynchronous stitching", where a data source can define how it extends another data source if that data source exists in the schema. This would allow plugins to build on each other when possible, but wouldn't require them to be peerDependencies. From an open/shareable standpoint, this seems like a way to have our cake and eat it, too: we optionally upgrade the GraphQL schema, rather than hard-coding fragile relationships.

The logic behind this wouldn't require GrAMPS to function; it's basically checking the schema for a given type before applying the mergeSchemas call.

I'm not sure how well this fits into the overall goal of this RFC, but I think it could help us implement the "Schema Builder" with more flexibility and extendability.

Happy to help out on this however I can. Let me know if you want me to expand on any of this.

@pieh
Copy link
Contributor Author

pieh commented Feb 27, 2018

@jlengstorf Wow, GrAMPS is cool! Not sure if it will fit, but I will definitely read up more on it (either to use it or at least steal some ideas!) I will for sure reach out to you for your insight.

I'd like to keep this RFC to not focus on implementation details (too much 😄). I want this to serve as requirements gathering place so we could later design APIs that could be extended if needed (to not over-engineer initial refactor) but not changed (wishful thinking 😄). I think we could expose same internal APIs to plugins but to do that they need to be well designed and not be subject to breaking changes in near future.

@i8ramin
Copy link
Contributor

i8ramin commented May 16, 2018

Hi. Has there been any update on this issue? Just wondering. I really wanna use Contentful + graphql ... but this issue makes it very hard to do so :(

@niklasravnsborg
Copy link
Contributor

Just reading into this concept. I wrote a custom source plugin where a field from my JSON API can be null. These fields don't end up in my schema as I would expect. Are there any updates on this?

@pieh Awesome work! Keep it up 😊

@calcsam calcsam changed the title [RFC] graphql schema refactor [RFC] Allow user to define GraphQL schemas Aug 11, 2018
@calcsam
Copy link
Contributor

calcsam commented Aug 11, 2018

@i8ramin -- it's definitely in our backlog!

@pieh -- I've renamed this issue for clarity

@pieh pieh changed the title [RFC] Allow user to define GraphQL schemas Graphql schema refactor Aug 24, 2018
@pieh pieh changed the title Graphql schema refactor [EPIC] Graphql schema refactor Aug 24, 2018
@pieh pieh assigned pieh and KyleAMathews and unassigned pieh Aug 24, 2018
@pieh pieh added Epic labels Aug 24, 2018
@KyleAMathews
Copy link
Contributor

Was talking to @sgrove today with @calcsam and he had a really interesting idea which could apply here. Basically it was how to estimate when you've sufficiently inferred a graphql type from data. He said you could assign a "novelty" score to each type you're inferring i.e. how novel do you expect each new item to be. You evaluate sample data item by item. Each time you "learn" something new e.g. a new field, the expected novelty score goes up. Whenever an item matches the existing inferred type, the expected novelty score drops. After evaluating enough new items and not learning anything new, you can quit.

This could speed up processing large data sets as we could pull out random samples and often times (especially on data that has representative data for each field on every object) we could stop the inference process quite a bit sooner than we do now.

@KyleAMathews
Copy link
Contributor

@sgrove mentioned this lib too https://github.com/cristianoc/REInfer

@freiksenet
Copy link
Contributor

@stefanprobst btw, instead of projection as an object, we should include a GraphQL fragment.

@stefanprobst
Copy link
Contributor

@freiksenet Interesting!! I did projection as an object mainly because it connects nicely with what graphql-compose's Resolver class provides for free.

@freiksenet
Copy link
Contributor

@stefanprobst right, that makes sense. We can keep it as is for now. Fragment is more flexible cause one can include eg nested fields. I'm borrowing it from here

@stefanprobst
Copy link
Contributor

@freiksenet I'm all for allowing fragments to extend the selection set! Nested fields should work with the projection object as well though, e.g. projection: { foo: true, nested: { foo: true } }.

@freiksenet
Copy link
Contributor

@stefanprobst Hey, why do you have this line in the code?

const updateSchema = async () => {
  const tc = addInferredType(`SitePage`)
  // this line
  delete tc.gqType._gqcInputTypeComposer
  addResolvers(tc)
  addTypeToRootQuery(tc)
  return schemaComposer.buildSchema({ directives })
}

@stefanprobst
Copy link
Contributor

@freiksenet Sorry should have put a comment there. graphql-compose can save a reference to a corresponding InputTypeComposer on this property of the TypeComposer. When updating SitePage we want to create a new InputObjectType, this seemed to be the simplest way without having to mess with the local cache in getFilterInput. Deleting the property should produce a new InputTypeComposer for SitePage when calling getITC() here.

@freiksenet
Copy link
Contributor

Shouldn't addInferredType produce a new type composer anyway?

@stefanprobst
Copy link
Contributor

In type inference, if a TypeComposer already exists, we use it. This is done not mainly because of schema updating, but a TypeComposer can be created when parsing SDL in the step before. One consequence for schema updating is that we re-use a TypeComposer that has the previously produced InputTypeComposer on an internal property. There certainly are more elegant ways to invalidate or reuse this input type - since we want to get rid of schema updating at some point anyway I didn't put much effort into that though.

@freiksenet
Copy link
Contributor

@stefanprobst got it.

I can't find date type anywhere in code, how is it added to the composer?

@freiksenet
Copy link
Contributor

Right, never mind, it's build in GraphQL Compose.

@stefanprobst
Copy link
Contributor

@freiksenet Sorry it took a bit longer - there is now a branch with everything squashed, merged with master, and with the original tests added at https://github.com/stefanprobst/gatsby/tree/schema-refactor-v2

@freiksenet
Copy link
Contributor

@stefanprobs Thank you! That's very helpful.

@freiksenet
Copy link
Contributor

freiksenet commented Jan 30, 2019

@stefanprobst I do'nt quite understand this code:

// would provides the projected fields out of the box.
const addProjectedFields = fields =>
  Object.entries(fields).reduce((acc, [fieldName, fieldConfig]) => {
    const { resolve } = fieldConfig
    acc[fieldName] = {
      ...fieldConfig,
      resolve: (source, args, context, info) => {
        // TODO: We don't need the whole selection set,
        // just the `projected` fields on children.
        const { getProjectionFromAST } = require(`graphql-compose`)
        const projection = getProjectionFromAST(info)
        const { selectionSet } = info.fieldNodes[0]
        extendSelectionSet(selectionSet, projection)
        return resolve(source, args, context, info)
      },
    }
    return acc
  }, {})

You are getting projection from AST then merging that same projection back into selection set. Why?

@freiksenet
Copy link
Contributor

freiksenet commented Jan 30, 2019

Some status update, pinging @ChristopherBiscardi who is blocked on the changes.

  • I'm porting the code by going file-through-file.
  • I'm being conservative and I'm removing changes that would result in minor breaking changes.
  • I'm moving from a dependency on global schemaComposer state and instead passing schemaComposer to functions.
  • Same goes for "db", that I pass as "nodeStorage", both to schema generation and to graphql context to be used in resolvers.
  • I've made addTypeDefs and addResolvers to be actions, instead of new api runs.

My one week prediction is probably overly optimistic, but I hope to get something running until the end of the week. Let's see how it goes.

@freiksenet
Copy link
Contributor

Also I feel I'd make a similar thing that was done with lokijs and put the new schema thing under a flag. Without the flag, the two new apis won't work, but otherwise it should (hopefully) work identically. I feel many changes that @stefanprobst did to the APIs/internal logic are very good, but I want to first release something that's fully compatible before possibly adding those for 3.0.

@stefanprobst
Copy link
Contributor

@freiksenet Thanks for the update! Sounds like a great plan!

You are getting projection from AST then merging that same projection back into selection set. Why?

This is what's happening:
We use the getProjectionFromAST utility from graphql-compose (that's also used internally in its Resolver class) to get all fields in the selection set plus all fields defined in projection properties. This is returned as an object.
We then use this object (that represents all the fields that need to be in the selection set) and extend the selection set accordingly.
As the TODO comment is trying to convey, this is not the most straightforward way: there is no need to use the getProjectionFromAST utility - we should just collect all the projection props ourselves and use those to extend the selection set - I hope to find some time next weekend to improve on this.

@stefanprobst
Copy link
Contributor

Also for info: except for a few tests that didn't make sense anymore (I left a comment there) all the old tests are ported and passing - except (i) sorting null values, and (ii) we infer a mix of date objects and strings as string, not as invalid.

@gatsbot
Copy link

gatsbot bot commented Feb 20, 2019

Hiya!

This issue has gone quiet. Spooky quiet. 👻

We get a lot of issues, so we currently close issues after 30 days of inactivity. It’s been at least 20 days since the last update here.

If we missed this issue or if you want to keep it open, please reply here. You can also add the label "not stale" to keep this issue open!

Thanks for being a part of the Gatsby community! 💪💜

@gatsbot gatsbot bot added the stale? Issue that may be closed soon due to the original author not responding any more. label Feb 20, 2019
@wardpeet wardpeet added not stale and removed stale? Issue that may be closed soon due to the original author not responding any more. labels Feb 20, 2019
@stefanprobst
Copy link
Contributor

Everyone: We have an alpha version of our new Schema Customization API ready, and we'd love your feedback! For more info check out this blogpost.

@freiksenet
Copy link
Contributor

This is released now in 2.2.x.

@sami616
Copy link

sami616 commented Mar 22, 2019

Im running into an issue when using the new schema customisation with gatsby-source-contentful.

Defining a type where one of the fields points to a reference array always returns null

Ie:

type ContentfulPage implements Node {
   editorTitle: String
   title: String
   slug: String
   sections: [ContentfulSection] // im always null 
}

@stefanprobst
Copy link
Contributor

@sami616 Could you file a bug report for this, and if possible include a test repo to reproduce the error? That would be very helpful, thank you!

@sami616
Copy link

sami616 commented Mar 23, 2019

@stefanprobst will try to file a bug report today - thanks!

@sami616
Copy link

sami616 commented Mar 23, 2019

@stefanprobst #12787 👍🏻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests