Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Add Appendix A: Persisted Documents #264
base: main
Are you sure you want to change the base?
[RFC] Add Appendix A: Persisted Documents #264
Changes from all commits
25c6968
7863941
a1a2c25
6e9cb85
433b37f
6fbc6ed
52d56fb
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we give a formal BNF syntax? Maybe restrict the identifiers to alpha numeric? GraphQL names maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you like to submit a PR to my PR to add this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a plan. This week's quite busy but I'll aim for next week! (famous last words 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the longest week ever but attempt at formal syntax is here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if actual encryption is needed? what's the benefit of having
sha256
over any other opaque string? 🤔We can allow users how to encode/map their persisted documents, and then do matching based on that value?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The aim of a specification like this is interoperability; so a client that supports persisted operations should be able to use a server that supports persisted operations without too much additional setup. Sharing details of the document identification method used out-of-band is supported (explicitly by
x-
prefix, or by custom identifiers); but for maximum compatibility there should be a shared baseline in my opinion.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I expect that the benefit is mainly for APQ support, where a client can enable APQ without knowledge of the server implementation, and so a consistent implementation is essential. SHA256 can be computed natively by modern browsers with a very low collision rate, making it a good choice in this scenario.
However, in a scenario where the identifiers are only known to the server, and must be registered with the server in order for the client to operate (as so far is documented in this RFC), they might as well be any opaque key returned by the query storage database. If the database stores the queries with an auto-incrementing integer as an identifier, that would work just as well.
Even so, there are still some benefits to using a hash:
I'm fine with the current suggestion of
sha256:
but allowing forx-id:
and similar.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, thank you @Shane32 @benjie .
My point is not about the prefix or the method but for the need for encryption? a user can decide to use
operation-1
as the key (instead of an actual computed hash), and the result will be the same 🤔 🤔 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dotansimha Indeed, that's already allowed under this spec. The issue is it requires coordination between server and client (they need to agree on how
operation-1
is derived). For maximum compatibility, if both server and client already know how to identify operations (e.g. standardized SHA256 hash) then all that's left is to transfer the docs from client to server, which can happen after the client has been built, rather than before or during, and no configuration is required on server or client easing adoption for everyone and encouraging more people to use an operation allowlist.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, right, but the queries still need to be transferred to the server is what you're saying. Yes, that's true, but it can be done after the client is built (but before it's deployed). That's different from having to do it during the build process; it means that clients can be built and persisted documents written even before the server exists. It also allows for arbitrary transfer of documents to the server (you can send them one-way on a pen drive through the mail if you want!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the client and the server must be coordinated, and we recommend to use SHA256, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Recommend" and "should" are the same according to RFC2119, so I think that is saying what's already there. If the server follows this recommendation, the client doesn't need any configuration. If the server doesn't do this; then you need to coordinate between client and server. For optimal interoperability, no coordination should be necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be pedantic there, I'd argue that sharing the "sha256" method is still coordination between the client and server. Plus the documents need to be actually transferred (on a pen drive or avian carrier!).
So coordination is always required but sha256 is a convenient and widespread default which we recommend (hence the should)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed, it is a very light-touch asynchronous form of coordination. Essentially the coordination boils down to two things: 1. does the server support SHA256 hashes (don't necessarily need to ask the server this, it's a fact that should be established in the development team); 2. we need a way to send the operations+hashes to the server and to know when they have been persisted.
Client is informed that server supports SHA256 hashes.
Client: performs build including generation of hashes
Client (at some point later): ships hashes and operations to server somehow
Server (upon receipt): stores queries+hashes somehow
Client (after server has stored): is deployed
Since the client can't be deployed until the server has stored the queries/hashes, there is indeed coordination. The coordination is incredibly lightweight compared to alternatives where the server must generate hashes during the client build process.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At Airbnb we append the operation name to the URL for the GraphQL requests for easier debugging and tracing. I wonder if this spec can include an example or implementation of such solutions:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be unwise to recommend that people do this based solely on the operation name (the risk of clashes as they iterate their queries is too high); however I would support the name being factored into the document identifier along with some hashing; e.g. something like:
createHash('sha256').update(documentSource).digest('hex').substring(0, 12) + '_' + operationName
. TheoperationName
query parameter is already specified; perhaps we should add a non-normative note recommending that clients include it to aid in debugging?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
query
is omitted here and throughout. Makes sense, as it is not necessary for operation and should not be included. But I think APQ-type behavior should be considered and allowed for within the spec. Preferably, it is an optional feature as part of this spec, or otherwise include a note to the effect thatquery
may be allowed in certain use cases, etc.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In an unofficial APQ request based on this specification, the query should go into
extensions
, and the error code used to detect the missing query should also go into the error'sextensions
. We may specifyquery
in future if we officially specify something like APQ; that's definitely feasible over what we already have.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are your opinions on providing the
documentId
via the URL/graphql/<documentId>
?One of the benefits would be easier debugging and visibility in dev tooling.
Edit: already covered via https://github.com/graphql/graphql-over-http/pull/264/files#diff-9be5577e05ae2112d2b8f95584b162d0dec01453bf6c85df58bf5db4f2c9727aR166-R168
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think this should be encouraged more; it's great for caching. I'd welcome your edits to address this, if you were so inclined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have to define expectations in case this exceeds maximum URL size?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do; we should do it in the main spec: https://graphql.github.io/graphql-over-http/draft/#sec-GET
The Appendix tries not to redundantly repeat statements from the main spec if it can avoid it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which reminds me; I heard that some people are using headers to specify variables when using GraphQL-over-GET... Apparently that works around the length limit 🤨
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, a lot of folks use headers and then the server adds them to the
Vary
response-headers so browsers/... can know that it is part of the cache-keyThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a section on i.e. returning 404 for persisted-documents we can't find and maybe even 400 if they don't leverage an allowed prefix?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the URL format 404 should be encouraged. I've been writing up a change to the appendix which encourages the URL format, but haven't had time to finish it yet; I've just raised a PR for my WIP so we have something to easily reference: #305
IMO for the non-URL version (traditional), 404 should not be used - it suggests that the
/graphql
endpoint is not found, which would be confusing.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.