Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collapsing multiple "node" requests into a single "nodes" request in the Network Layer #1365

Closed
ruslantalpa opened this issue Aug 31, 2016 · 5 comments

Comments

@ruslantalpa
Copy link

ruslantalpa commented Aug 31, 2016

After reading the discussions/solutions in the following issues regarding batching
Collapse and Batch queries #520
Custom Network Layer -- Modifying the RelayQueryRequest #654
I still feel the proposed solution (wrapping the queries client side and unwrapping on the server) only partially solves the problem of multiple paralel requests to the node endpoint.

Consider the task of displaying a list of items (that implement the node interface) and their properties (a,b,c).
If after the initial fetch of, let's say 100 items, you decide to display also the 'd' propertie, then relay will issue 100 requests to the node endpoint.
All of those requests (most of the time) will have the exact same query, only the variable value (id) will be different.
If you go with the proposed route, after the requests will be unwrapped on the backend server, they will still look like 100 concurrent requests hitting the server. If those items are fetched from the db, it means either 100 opened db connections or sequencing them over x db connections.

A much more optimal solution would be if a multiple queries like this
query ($id_0:ID!) { node(id:$id_0) { ... } }
could be rewritten on the client side (where there is already an AST present) into
query ($ids_0:[ID]!) { nodes(ids:$ids_0) { ... } }
provided the graphql server implements the nodes endpoint.

The difference between the two methods basically is the same as running 100 queries like select * from items where id = .... compared to select * from items where id in (....)

I guess my question boils down to:
Can someone give an example of generating (not modifying the original request) a RelayQueryRoot to the nodes endpoint starting from a request to node interface.
The plan is run this single request, then walk the result and distribute the fetched nodes to the original requests.

Thank you

@ruslantalpa ruslantalpa changed the title Collapsing multiple "node" requests into a single "nodes" request Collapsing multiple "node" requests into a single "nodes" request in the Network Layer Aug 31, 2016
@josephsavona
Copy link
Contributor

Thanks for asking about this.

There are a few approaches here:

  • use forceFetch so that Relay doesnt diff the input query. This means the app will send one query instead of n.
  • use relay-network-layer on client and server in order to batch the eg 100 requests into one.

Longer term, we are moving away from dynamic queries and query diffing. In the next major version - Relay 2 (#1369) - queries are static, which will prevent this issue from occurring.

Do either of the first two approaches help?

@ruslantalpa
Copy link
Author

option 1 works in a way but negates all the advantages of Relay
option 2, that's exactly what i am asking for, i want to batch all those queries in a single request, but not like a simple envelop that is unpacked server side, but collect all the requests that are the same and create a single dynamic request that would fetch all the needed nodes then redistribute the result to the original request.
(reading about Relay 2 now)

@josephsavona
Copy link
Contributor

josephsavona commented Sep 5, 2016

option 1 works in a way but negates all the advantages of Relay

I have to respectfully disagree with you on this. Query "diffing" is an extra layer of optimization that can be beneficial in some cases. But it is a tradeoff: diffing itself is non-free, and it requires printing a query string at runtime (also not free). Merging n requests into one is also not free, and can't be done universally (each of those n requests could end up fetching different fields, depending on what was fetched before).

Overall, we've found that diffing isn't the right tradeoff. In Relay 2 we will only support binary diffing (either skip the query if all data is fetched, or refetch in full if anything is missing), along with first-class APIs for common cases such as pagination.

Another short-term option is to "prefetch" the data - if you're seeing lots of small requests when transitioning from view A to view B, you might explicitly include view B's fragment in query A. That way the data is already cached for the transition.

@josephsavona
Copy link
Contributor

(Didn't mean to close, the GitHub "close issue" button is terribly positioned on mobile.)

@ruslantalpa
Copy link
Author

ruslantalpa commented Sep 5, 2016

I understand your challenges in making Relay universally applicable and as fast as possible (hense no more diffing) and you've probably seen how it works in multiple scenarios it's just that in the scenario i'm thinking of (which is not that uncommon) there are a few problems with what you said above:

  • query diffing and all that, while not free, is much more cheeper than fetching unneeded data over the wire (that was the main promise of GraphQL), not to mention the extra load on the db for fetching that data (i think ppl will want to offload most of the computation to the client instead of the backend db). In all relay examples it seems you always fetch 10-15 items (mobile app) but i am thinking 100 items each with 10-20 child items (desktop app)
  • For the following query in relay 2
{
   projects {
       id name
       tasks @include(if: $showTasks) {
          id name
       }
   }
}

if at first $showTasks is false, only the projects will be fetched (think 100), then if it changes to true, the same data 100 projects will be fetched but this time with the additional tasks info.
Relay 1 on the other hand would fetch only the tasks info for all the project nodes (which i think is good in this case) and all that's needed is to use the nodes endpoint and make 1 request instead of using node endpoint.

Maybe i did not make it quite clear, in this particular case, i was happy with how relay chose to fetch the info, i was just looking for how one could "on the fly" create a request based on the AST in the original request. Since this behaviour will change completely in Relay2 i guess it's relevant but from what you mentioned it seems there would be many situations with a lot of over-fetching, it's particularly scary when the query will be several layers deep (company>clients>projects>tasks>comments) and and one component decides to fetch an additional field.

Maybe i misunderstood your post so i'll go and watch the info in the issue you linked.
Feel free to close this.
Thanks again

Edit: I guess i am looking at this as running in a browser on a powerful CPU and you are concentrating on making things work fast in a mobile setting.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants