Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[aws-appsync] code-first schema generation #9305

Closed
10 tasks done
BryanPan342 opened this issue Jul 28, 2020 · 29 comments
Closed
10 tasks done

[aws-appsync] code-first schema generation #9305

BryanPan342 opened this issue Jul 28, 2020 · 29 comments
Assignees
Labels
@aws-cdk/aws-appsync Related to AWS AppSync effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. management/tracking Issues that track a subject or multiple issues p2

Comments

@BryanPan342
Copy link
Contributor

BryanPan342 commented Jul 28, 2020

Allow definition of the schema to happen within the cdk stack. The generated schema would be directly inserted into the CloudFormation Template at runtime.

Use Case

Currently there are only two ways to define schema: ,inline or with a file.

Inline
const inlineSchemaDefintion = `
  ...
`;
const api = new appsync.GraphQLApi(stack, 'api', {
  name: 'api',
  schemaDefinition: `${inlineSchemaDefinition}`,
});
File
const api = new appsync.GraphQLApi(stack, 'api', {
  name: 'api',
  schemaDefinitionFile: join(__dirname, 'schema.graphql'),
});

A code-first approach would allow for definition of the GraphQL schema to happen inline alongside resolvers.

Proposed Solution

Write the schema definition along with the resolvers inline.

Implementation
const api = new GraphQLApi(stack, 'ExampleApi', {
  name: 'example',
  schemaDefinition: SCHEMA.CODE,
  ...
}

const exampleTable = new db.Table(...);
const exampleDS = api.addDynamoDbDataSource('exampleDataSource', 'Table for Demos', exampleTable);

// NEW IMPLEMENTATION STARTS HERE

// Defining attribute types (i.e. Int! and String!)
const t_int_r = AttributeType.int().required();
const t_string_r = AttributeType.string().required();

// Defining Object Type ( i.e. type Example @aws_iam { id: Int! content: String! } )
const example = api.addType('Example', {
  definition: {
    id: t_int_r,
    content: t_string_r, 
  },
  directives: Directives.iam(),
});

// Defining the attribute type for the Object Type 'Example'
const t_example = AttributeType.object(t_example);
const t_example_l = AttributeType.object(t_example).list();

api.addQuery( 'getExamples', {
  type: t_example_l,
  resolve: [{
    dataSource: exampleDS,
    request: MappingTemplate.dynamoDbScanTable(),
    response: MappingTemplate.dynamoDbResultList(),
  }],
});

api.addMutation( 'addExample', {
  type: t_example,
  args: {
    version: t_string_r,
  },
  resolve: [{
    dataSource: exampleDS,
    request: MappingTemplate.dynamoDbPutItem(PrimaryKey.partition('id').auto(), Values.projecting('example')),
    response: MappingTemplate.dynamoDbResultItem(),
  }],
  directives: Directives.iam(),
});

Other

I will be using this issue as a way to track the smaller components of this feature request and as a point of discussion for implementation.

Visit this repository to see how to generate SWAPI in a code-first approach.

Features


This is a 🚀 Feature Request

@asterikx
Copy link
Contributor

I'm a bit hesitant about this "code-first" approach. IMO a GraphQL schema file is code and I do not see the necessity to create a new CDK specific DSL for creating GraphQL schemas.
One of the main advantages and selling points of GraphQL is that the schema is the single source of truth. GraphQL tooling is massive and the schema acts as a standardized interface. Interoperability is clearly an issue.
With this "code-first" approach the schema file is no longer the source of truth, it will be the CDK code. This means you have to run cdk synth (or a similar command) to export the schema so that other tools can use it. Outdated exported schema files will likely become an issue.

In practice, you also want to use the schema to generate clients or model files (e.g. Amplify), add custom directives (see Amplify directives or custom directives in gqlgen), etc.
Do you plan on exporting the schema file so that it can be used by existing tooling? What about custom directives?

Worse, the CDK code will likely use make use of files derived from the schema by external tools. E.g., a generator creates model files from the types in the schema. These model files could be used by frontend or backend files, which, in turn, are imported and deployed by the CDK (e.g. through lambda.fromAsset(path)). To generate the schema, you would need to run cdk synth. But cdk synth will fail as the file at path (build output) does not yet exist. To build the file at path, the schema would need to be exported first. Deadlock.

The only advantage I see is "type-safety" when attaching resolvers to queries/mutations as it will become impossible to attach a resolver to a query/mutation that does not exist. But then again, there is still no type-safety within the VTL mapping templates (e.g. a template can still return a data structure that does not match the return type of the query/mutation) - the pre-defined MappingTemplates are limited to the most simple use cases.
I think this might be a bigger issue, and I'm not sure if the planned effort on a "code-first" schema is worth it.

IMO, it feels wrong to create a new language on top of GraphQL, which already is specialized query language.

That is just my 2 cents though. Either way, there is a lot to consider (external tooling, interoperability, extensibility) for this to be useful in practice.

@BryanPan342
Copy link
Contributor Author

@asterikx Thanks for the feedback 😊

I definitely certain elements to graphql.


the pre-defined MappingTemplates are limited to the most simple use cases.

I totally agree with this! The current implementation of Mapping Templates is super limiting and at the end of the day, I end up writing a lot more VTL than I would like. @duarten is working on an RFC #175 to provide better infrastructure.


The only advantage I see is "type-safety" when attaching resolvers to queries/mutations as it will become impossible to attach a resolver to a query/mutation that does not exist.

A lot of the motivation behind adding a code-first approach was to simplify graphql and the intricacies of resolvers/mapping templates. For seasoned graphql users, I can definitely see why this abstraction seems unnecessary. We wont remove the current functionality of using a schema.graphql file to define the AppSync schema.

We drew inspiration from other code-first libraries such as GraphQL Nexus. I think there are pros/cons to both approaches. But a code-first approach offers developer workflow that a schema-first approach just doesn't:

  • modularity: organizing schema type definitions into different files
  • reusability: often SDL definitions involve boilerplate/repetitive code
  • consistency: resolvers and schema definition will always be synced

This means you have to run cdk synth (or a similar command) to export the schema so that other tools can use it.

What if we just generated a file in cdk.out or another directory that is the schema.grapqhl file?


Worse, the CDK code will likely use make use of files derived from the schema by external tools. E.g., a generator creates model files from the types in the schema.

I'm not sure I understand this completely. Is the assumption that we would use an external library to generate the schema? We actually were going to generate schema in-memory. So the entirety of the schema generation would be done in house.

@asterikx
Copy link
Contributor

Thanks for the explanation @BryanPan342!

A lot of the motivation behind adding a code-first approach was to simplify graphql and the intricacies of resolvers/mapping templates. For seasoned graphql users, I can definitely see why this abstraction seems unnecessary. We wont remove the current functionality of using a schema.graphql file to define the AppSync schema.

We drew inspiration from other code-first libraries such as GraphQL Nexus. I think there are pros/cons to both approaches. But a code-first approach offers developer workflow that a schema-first approach just doesn't:

  • modularity: organizing schema type definitions into different files
  • reusability: often SDL definitions involve boilerplate/repetitive code
  • consistency: resolvers and schema definition will always be synced

I see. That definitely makes sense.

What if we just generated a file in cdk.out or another directory that is the schema.grapqhl file?

Yup, I just assumed this would be done by cdk synth.

I'm not sure I understand this completely. Is the assumption that we would use an external library to generate the schema? We actually were going to generate schema in-memory. So the entirety of the schema generation would be done in house.

No. I assumed that schema generation would be done in house.
I'm taking it a step further: files that are generated from the (generated) schema file, using external tools such as GraphQL Code Generator.

Suppose that I want to use the model files (types) generated by GraphQL Code Generator in my frontend codebase. The schema.grapqhl file needs to exist before I can build my frontend.
In addition, suppose that my CDK app deploys the frontend build outputs using the BucketDeployment construct. In this case, the build outputs of my frontend need to exist for cdk synth can be run (otherwise, it will fail due to missing files).
It's a chicken-egg problem.

@BryanPan342
Copy link
Contributor Author

@asterikx

Suppose that I want to use the model files (types) generated by GraphQL Code Generator in my frontend codebase. The schema.grapqhl file needs to exist before I can build my frontend.

Ooo I see now. I think there still are work arounds for this even with the code-first approach. For example, I believe to start, you could just have an empty schema.graphql file for BucketDeployment. I haven't tested this but it feels like something that could work.. You could even make two stacks and have the BucketDeployment stack depend on the AppSync stack.

Overall, these are really great points that we will keep in mind during implementation but seem out of scope for the use case of a code-first approach.

@andrestone
Copy link
Contributor

Hey @BryanPan342,

Thanks for the awesome improvements! I wonder if you could provide more detailed examples leveraging these new features.

@asterikx
Copy link
Contributor

asterikx commented Aug 5, 2020

Ooo I see now. I think there still are work arounds for this even with the code-first approach. For example, I believe to start, you could just have an empty schema.graphql file for BucketDeployment

That would solve the bootstrapping issue.

You could even make two stacks and have the BucketDeployment stack depend on the AppSync stack.

I think what you suggest is interleaving synth and build actions? I. e. first synth the stack that contains the AppSync API (which will output a schema.graphql, then generate the additional files from schema.graphql and build the frontend, and lastly synth the stack that contains the BucketDeplyoment for the frontend.
Not sure if this is always possible, e.g. e.g. when using the new CDK pipeline construct where the stacks are grouped under a single stage.

Overall, these are really great points that we will keep in mind during implementation but seem out of scope for the use case of a code-first approach.

Yeah, it's hard to foresee all scenarios. It is probably best to just try it out and tackle the issues as they arise. I think it's important to keep developer experience in mind here.

@BryanPan342
Copy link
Contributor Author

BryanPan342 commented Aug 5, 2020

@asterikx Thanks for all the awesome feedback! Really helped me scope the issue 😊 Developer experience is very near and dear to me so discussions like these


@andrestone currently working on the obejct type definition and basically the foundation of code-first schema. I'm thinking about putting more finer grain examples in issues found in the checklist. Wdyt?


Here is a comment on object types.

@BryanPan342
Copy link
Contributor Author

UPDATE

Check out this repository to see how to generate SWAPI in a code-first approach.

Note: Most of the CDK isn't merged in yet, but this is representative of what it looks like to make a large graphql api.

@ranguard
Copy link
Contributor

ranguard commented Nov 8, 2020

Apologies if not directly relevant here... but I'm trying to get my head around best structure for appsync in CDK with multiple stacks (microservices).

  • I want one AppSync GraphQL service in front of all stacks (where auth is also setup)
  • I want the microservice stacks to be responsible for setup of their own part of the GraphQL (data source, schema, resolves etc)
  • Ideally each stack should be able to have it's own schema.graphql file (rather than having to do it all code first).

I could imagine doing this fully code-first (even though messy and dependency 'fun') but wondered if there were already best practices or examples of this somewhere ?

Thanks for the consideration

@BryanPan342
Copy link
Contributor Author

@ranguard

Ideally each stack should be able to have it's own schema.graphql file (rather than having to do it all code first).

So are you still asking about how to do this code first? or are you asking more in terms of a schema-first architecture.

For code-first, you can define the schema outside of cdk!

So if you really wanted to, you can essentially create your object types, enum types, interfaces etc. in separate folders representing each CDK stack, and then merge it together in an index.ts file and use that as your point of reference when creating your schema!

Here is an example: SWAPI.

@ranguard
Copy link
Contributor

ranguard commented Nov 9, 2020

Thanks for the reply, maybe to try state my problem clearer...

What would be best practices...

When putting a single AppSync GraphqlApi in front of multiple
CDK micro services (each in a separate stack). Can each service be
responsible for it's own part of the setup (and schema) and configuration of
the GraphqlAPI ? Can each micro service setup it's part of the
schema using a schema.graphql file rather than everything
having to be code-first, and also setup the data sources and resolvers?

I was thinking of something like:

stack/appsync
   construct/main.ts      - where actual `new appsync.GraphqlApi()` lives
stack/micro_service_1
   construct/appsync.ts     - resources and schema obj for service 1
   construct/schema.graphql - schema file for service 1
stack/micro_service_2
   construct/appsync.ts     - resources and schema obj for service 1
   construct/schema.graphql - schema file for service 1

Note; I'm using construct/appsync.ts not construct/schema.ts as the
construct maybe creating data sources as well as managing the schema.

I really like that CDK can do the leg works of converting
the schema.graphql into a Schema object, but here I'd like 2 source files.

I would like to minimize stack dependencies...

I think I could do something like this

In main.ts:

import * as service_1 from '../../micro_service_1/construct/appsync.ts`;
import * as service_2 from '../../micro_service_1/construct/appsync.ts`;

const schema = new Schema();
service_1.addToSchema(schema);
service_2.addToSchema(schema);

But if those appsync.ts files are also creating resources (e.g. DynamoDB data source)
then they'd be in the wrong stack, and if they didn't then I'm creating dependencies
or something via parameters which gets messy as well.

I note there is appsync.GraphqlApi.fromGraphqlApiAttributes so maybe I could reverse
and have each micro_service import the main GraphQL Api and then use
code first to manipulate the schema api.schema.addObjectType() ?

Thought that doesn't tick the box of the service still being able to
have a schema.graphql source and I'm not sure if it's nice design or not.

Your thoughts are most appreciated

@andrestone
Copy link
Contributor

andrestone commented Nov 9, 2020

I guess you should work with CfnOutput to accomplish that. Otherwise, if you import the schema pieces directly, you could deploy the merged schema even if the microservice stack fails to deploy (the schema will succeed to deploy even if the resolvers / data sources don't).

You could introduce some checks to prevent that from happening, but I think using outputs, dependency and conditions would be the "best practice" here.

Take a look at this: https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_core.CfnOutput.html

Edit: To make it even clearer, the idea is to have the schema definition pieces as outputs from each microservice stack and stitch them together in another stack (the one that would update the schema in the api resource).

@kfcobrien
Copy link

kfcobrien commented Dec 8, 2020

Picking up on @ranguard's question

I note there is appsync.GraphqlApi.fromGraphqlApiAttributes so maybe I could reverse and have each micro_service import the main GraphQL Api and then use code first to manipulate the schema api.schema.addObjectType() ?

I have been trying to implement this approach but the fromGraphqlApiAttributes function returns an interface IGraphqlApi which does not cast to a GraphqlApi class as expected. Upon doing so the following exception is thrown

(<appsync.GraphqlApi>gql).addQuery('response', new appsync.ResolvableField({
                          ^
TypeError: gql.addQuery is not a function

Is this a bug or intended?

@BryanPan342
Copy link
Contributor Author

@kfcobrien

This is actually a really good question. So the code-first approach essentially creates an appsync.Schema object in memory. The import function fromGraphqlApiAttributes will return a class that is devoid of the schema. The reason being, if you are coding in schema-first, you should not be needing to change the schema through CDK.

Now, for the code-first approach, the neat thing about the appsync.Schema is that you can declare it outside of CDK because it isn't tied to any CDK scope.

Here is an example of how you can take advantage of these types: example.

If you want, you call also declare this schema outside of the scope of CDK and import it in and add to the Schema as you go. I believe that is also another workaround (note if you do it this way, I would recommend having CDK deploy the AppSync stack last).

@kfcobrien
Copy link

@BryanPan342..... Ahhh that makes more sense to me now. So I suppose the best way to handle this is to store a reference to the Schema and update it as you see fit across multiple stacks and just keep the appsync stack on its own and rerun a pipeline for it whenever there is a change to the Schema externally. Thanks also for the SWAPI example, I will take a proper look at that shortly 👍

@BryanPan342
Copy link
Contributor Author

@kfcobrien yup! thats how i would go about it :)

Feel free to let me know your thoughts on how we can improve the experience. I think the next step is definitely improving the mapping templates so that the resolver can be easily added inline. But would love to hear your thoughts!

@kfcobrien
Copy link

kfcobrien commented Dec 9, 2020

@BryanPan342 I think a construct for the Schema that implements some from* methods would be great as it would allow referencing the schema from external resources (separate projects) quite easily.

Is it possible to replace the entire schema in appsync after it has been deployed from another stack?

If so (or if not too big a feature request), you could pretty much decouple everything.

  1. Create stack with the new schema construct (store ref in param store)
  2. Create Graphqlapi stack and add schema (store ref in param store)
  3. Create microservice stack and pull both the schema and IGraphqlapi. Add to schema as desired, add datasource through IGraphqlApi and finally replace the current schema (if possible?) with the new update from the current microservice stack.

This way we need only touch the schema and graphql stacks only once and then keep the infrastructure required to add to them entirely in their own respective stacks.
Do think that this would be reasonably achievable or even a good design?
Forgive me if I'm way off here, I don't have much experience with appsync yet 🙂

@ranguard
Copy link
Contributor

I've been doing a bit more reading and I think what I'm actually after is schema-stitching, or taking that further GraphQL federation support in AppSync would be even better - such that each service can be responsible for it's own content in a shared schema and then be stitched together by a gateway. Some mention of it in aws/aws-appsync-community but no timeline.

With graphql-transform-federation it seems to be possible to hack something together now but an officially supported mechanism would go a long way.

As my end points aren't public yet feels like might be worth running multiple GraphAPI end points for now and waiting for AppSync to catch up.

@chrisadriaensen
Copy link

@ranguard +1 here; I actually wrote a custom script to stitch the schema together as new microservices are added... I do feel CDK could help here until AppSync catches up (not sure what their timelines are).

@BryanPan342
Copy link
Contributor Author

@chrisadriaensen just to clarify, this is for schema-first right?

So like having separate schema.graphql files?

@hirenumradia
Copy link

Hi,

I don't know if this is helpful but I have solved this in a slightly different way. Each of our microservices are in a separate nested stack. The nested stacks get composed into one stack for the backend.

I create the API in the parent stack and pass down the API as a prop to the nested stacks. Then I can add queries, mutations and types into the common API reference.

CDK builds the final schema during deployment. The problem I am finding is debugging schema issues using this code-first approach. I have to push the schema and watch it fail in Cloudformation, the errors in Cloudformation don't pinpoint where the schema is broken and its a pain to track down these schema errors.

@BryanPan342
Copy link
Contributor Author

@hirenumradia hmm yeah I remember facing similar issues when I was originally building and testing the framework

We do very high level testing to make sure each type is defined as per the GraphQL specs. But I'm sure there is a lot of room for improvement in terms of developer experience.

Ideally there would be some testing mechanism that would happen during build time, but I haven't thought too deeply on how we can accomplish it and keep the package tight.

@hirenumradia
Copy link

@BryanPan342 Got ya. Do you know of any ways I could debug this easier? Its killing my productivity at the mo :( I'm having to comment out different parts of the Code-first schema to see what could be breaking it, then trying a deploy. Does AppSync log the schema in Cloudformation on failure?

@MrArnoldPalmer
Copy link
Contributor

MrArnoldPalmer commented Apr 7, 2021

@hirenumradia the biggest thing that helped me debug my schema/resolver issues was enabling all of the logging in appsync. Checkout the docs here for information on enabling this and then you will get much clearer failure messages in cloudwatch.

EDIT: on second reading this may not be helpful for your case since this is during CFN deployment and most of the issues I have experienced weren't because of invalid schema but because of mis-matching schema/resolver config.

@BryanPan342
Copy link
Contributor Author

@hirenumradia what kind of issues are u running into? maybe we can write some tests for it if it is more of a graphql problem than deployment!

@hirenumradia
Copy link

Thanks @BryanPan342 @MrArnoldPalmer. 👍🏽 So we are at the start of setting up our infrastructure hence why these types of issues more likely to come up.

Background

Our setup has one Backend Stack that composes the various microservices. Each microservice is a Nested Stack that has any common infrastructure resources such as the API passed down to them.

The types and the fields are composed within these Nested Stacks, specifically, I am creating these types and fields within a CDK construct so that each use case that we are developing is separated into a "cloud component".

The issue

When composing the schema, I was able to put together the schema that passed the static compilation steps of Typescript, however, the syntactical validation was failing when deploying the schema.

I was previously generating a schema file that was "stitching together" multiple types and resolvers via a code generation module I wrote. I was able to see the schema before deploys and fix the issues. I moved onto this code-first approach for maintainability.

These were the Cloudformation errors I was getting

Schema Creation Status is FAILED with details: Internal Failure while saving the schema.

Schema Creation Status is FAILED with details: Failed to parse schema document - ensure it's a valid SDL-formatted document.

Immediate Thoughts / Suggestions

  • We could have some way of testing the schema before deployment where any validation errors of the schema are caught at the time of bootstrapping the stack and presented.
  • Provide a Class that helps to build the schema and contains some utility functions for validation

If you would like to chat in more depth, I'm happy to jump on a quick call and I can show you.

@MrArnoldPalmer
Copy link
Contributor

@BryanPan342 I feel like we are ready to close this one out and track bugs and features in separate smaller issues since we have an initial implementation released. Tell me if you'd prefer to hold it open though.

Schema Validation here: #14022

@BryanPan342
Copy link
Contributor Author

@MrArnoldPalmer good idea! though I do quite enjoy having a place for discussions about the improvements of the code-first approach, I think we should probably keep it to the AppSync tracking issue

@github-actions
Copy link

github-actions bot commented Apr 9, 2021

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see.
If you need more assistance, please either tag a team member or open a new issue that references this one.
If you wish to keep having a conversation with other community members under this issue feel free to do so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@aws-cdk/aws-appsync Related to AWS AppSync effort/large Large work item – several weeks of effort feature-request A feature should be added or improved. management/tracking Issues that track a subject or multiple issues p2
Projects
None yet
Development

No branches or pull requests

8 participants