Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Path Deduping only works on subsets, not intersecting sets of paths #779

Closed
jhusain opened this issue Apr 22, 2016 · 4 comments · Fixed by #897
Closed

Path Deduping only works on subsets, not intersecting sets of paths #779

jhusain opened this issue Apr 22, 2016 · 4 comments · Fixed by #897

Comments

@jhusain
Copy link
Contributor

jhusain commented Apr 22, 2016

Currently path deduping only works if a set of requested paths is a subset of the paths in an outstanding request. However we'd like deduping to work in the event that a set of requested paths intersects with those in an outstanding request, with only the difference being requested from the server.

For example, a request for ["genrelists", {from:0,to:5}, "name"] will be deduped against an outstanding request for ["genrelists", {from:0,to:10}, "name"]. However the request for ["genrelists", {from:0,to:5}, "name"] will not be partially deduped against an outstanding request for ["genrelists", {from:1,to:4}, "name"], with only the difference being requested from the server (["genrelists", [0,5], "name"]).

When a batch of outgoing requests is being accumulated, it is modeled as an Array of Array of Paths. In other words two separate requests for ["genreLists", 0, "name"] and ["genreLists", 1, "name"] will be modeled like this in a batch:

var outgoingBatch = 
  [
    [ ["genreLists", 0, "name"] ], // first request
    [ ["genreLists", 1, "name"] ], // second request
  ];

When a batch is sent out, it is converted into a path map, making it efficient to check for intersections with other sets of paths:

var outstandingRequestPathMap =
  {
    genreLists: {
      0: { name: true }
      1: { name: true }
    }
  }

Each outstanding request has a count indicating how many outstanding observers are awaiting the response. If all observer's unsubscribe, the count drops to 0 and the request is cancelled.

Currently the paths in each new request is matched against each outstanding request's PathMap. If the set of requested paths is a subset of an outstanding batch's set of paths, it is deduped and the number of observers listening for the batch is incremented. However if a set of paths merely intersects with those of the outstanding request, the algorithm gives up and creates a new batch for the paths.

The fix is to alter the algorithm to create a new batch for just the difference between the requested paths and the sets of paths that have already been sent to the server. Then it will be necessary to ensure that the request is only processed after all outstanding request dependencies (ex. ["genreLists", {from: 1, to: 4}, "name"] and ["genreLists", [0,5], "name"]) hey have been received from the server and merged into the cache. @trxcllnt Do you have anything to add to this?

@ktrott @steveorsomethin @sdesai @lrowe

@steveorsomethin
Copy link
Contributor

Possible relation to #819

@trxcllnt
Copy link
Contributor

trxcllnt commented Aug 17, 2017

@steveorsomethin so I remember this was actually pretty involved, but we did get it done in @graphistry/falcor in order to dedupe new requests against inflight partially-fulfilled streaming responses. IIRC we did it in this commit, though I think it also relied on some earlier work on the request queue to distribute paths across in-flight requests.

From a high level, I think the process went like this:

  1. Queue maintains a list of Requests, either pending or in-flight:
  • The last request in the list will always be the pending request. Pending requests have been scheduled, but not yet sent, so can be a target for request batching.
  • All the requests except the last one are in-flight. In-flight requests can't be modified once they're sent, but can be checked to determine if paths from the current get are already in-flight. If so, those paths can be stripped from the new request (path de-duping).
  1. Distribute the list of requested paths across the list of in-flight or pending requests, and build a list of requests that accepted at least one path. If there was a pending Request at the end, it will accept all the leftover paths. If not, either we had no Requests in the list or they were all in-flight, so create and append a new pending Request to the queue.
  2. Aggregate all Requests that accepted a path into a single Subscription, so if the get is canceled, we can decrement the refCount and/or cancel its pending request

Individually, a Request will keep track of the paths it's supposed to get.

In the case of a pending Request (which batches multiple gets together), we can have a list of lists of optimized paths ([[...optimized paths for get 1], [...optimized paths for get 2], ...]). If one of the gets is canceled before the Request is sent, we can remove its optimized paths from the batch wholesale (thanks to @michaelbpaulson for realizing this optimization a few years ago!)

When a pending Request becomes an in-flight Request, materialize the list of lists of optimized paths into a list of PathMap trees, which we can use to compute the path complements for de-duping.

Note: Our implementation has some logic specific to other changes we made for reusing the JSON output across get requests (essentially instead of get returning a list of requested* Paths, the recycleJSON version represents the requested* missing paths internally as a PathMap IIRC), but kept the existing behavior for the list of optimized missing paths.

That's most of what I remember, let me know if there's any more I can do to help!

* correction edit - I meant requested missing paths, not optimized

@steveorsomethin
Copy link
Contributor

Thanks a lot for the detailed write up @trxcllnt. I previously had a look at that commit and it looked pretty intense, so this is very helpful. We'll probably tackle this one in the next couple of weeks.

@asyncanup
Copy link
Contributor

#897 takes a stab at this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants