Fix __typename prevent subscription deduplication #652

darrenclark · 2018-12-14T18:35:40Z

This fixes an issue where identical subscription GraphQL queries would not be deduplicated, i.e. would end up with different doc IDs here:

absinthe/lib/absinthe/phase/subscription/subscribe_self.ex

Lines 22 to 23 in d34d5ab

    
           hash = :crypto.hash(:sha256, :erlang.term_to_binary(blueprint)) |> Base.encode16() 
        
           doc_id = "__absinthe__:doc:#{hash}"

The anonymous function used when resolving __typename ends being "different" when run on separate processes, preventing identical subscription queries that use __typename from being deduplicated (when run from separate processes)

The root cause of the issue:

defmodule Example do
  def print(msg), do: IO.puts msg

  def make_print_func(), do: &Example.print/1

  def make_anon_print_func(), do: &print/1
end

a = Example.make_print_func()
b = Task.await(Task.async(&Example.make_print_func/0))

IO.puts "a == b?  #{a == b}"
IO.puts ":erlang.term_to_binary(a) == :erlang.term_to_binary(b)?  #{:erlang.term_to_binary(a) == :erlang.term_to_binary(b)}"

anon_a = Example.make_anon_print_func()
anon_b = Task.await(Task.async(&Example.make_anon_print_func/0))

IO.puts "anon_a == anon_b?  #{anon_a == anon_b}"
IO.puts ":erlang.term_to_binary(anon_a) == :erlang.term_to_binary(anon_b)?  #{:erlang.term_to_binary(anon_a) == :erlang.term_to_binary(anon_b)}"

ends up producing:

a == b?  true
:erlang.term_to_binary(a) == :erlang.term_to_binary(b)?  true
anon_a == anon_b?  true
:erlang.term_to_binary(anon_a) == :erlang.term_to_binary(anon_b)?  false

I noticed this issue also happens with Dataloader / Dataloader.Ecto because of these anonymous functions:

https://github.com/absinthe-graphql/dataloader/blob/a61f550a28820d2befcd94411f7b03163f30db30/lib/dataloader/ecto.ex#L190-L191

(was able to workaround that on my end though, as I just provided non-anonymous functions for query and run_batch in my code)

@benwilson512 How do you feel about:

Replacing :crypto.hash(:sha256, :erlang.term_to_binary(blueprint)) |> Base.encode16()? (maybe to ignore anonymous functions, or simply use the query's string representation, not sure)

and/or

Allowing projects to generate their own doc IDs? I believe the project I work on could do some optimizations specific to our schema

I'd be happy to take on both of those tasks, just wanted to get your thoughts before I started on them

The anonymous function ends being "different" when run on separate processes, preventing identical subscription queries that use __typename from being deduplicated (when run from separate processes)

benwilson512 · 2018-12-14T18:44:32Z

Hey @darrenclark thanks for taking a look at this!

In Absinthe 1.5 what I want to do is focus on letting people generate their own doc ids. The current system has a couple of flaws, one of which you've found.

My thought is that we'd probably eliminate "by default" deduplication. If you want to have things deduplicated then you need to provide some kind of identifier. Identifiers really have two parts:

The document itself. A sha256 of the original query string is probably OK here and I'm not really sure it needs to be configurable. I want to avoid giving people obvious foot guns.
The context. This is the part we would require from people. Right now I do the term to binary and sha but depending on the circumstance this is often too narrow. It's too narrow in the sense that term equality can fail at times when there's still a logical equality (two different loads of the user out of the database may not have term equality even though they're the same user). We also want to allow people to have control over deduplication. IE perhaps the context id can just be "logged-in" for any valid user instead of always using the current user id. This provides substantial performance improvements.

A PR that works on these two things would be incredible.

darrenclark · 2018-12-14T19:08:30Z

@benwilson512 Perfect, we can definitely look into this. It should open up some significant optimization opportunities for us.

Any thoughts on the interface into this? Something similar to the document adapter interface (https://hexdocs.pm/absinthe/adapters.html)? Absinthe.Subscription.DocumentIdentification or something?

With that in mind, should I close this PR? Given the plans for 1.5, I'm not sure it adds anything of value.

benwilson512 · 2018-12-14T19:14:05Z

@darrenclark yeah I think we should drop this particular PR in favor of the above approach.

for the current user style id I was thinking it could be as simple as adding a key to the return value of the subscription config function IE:

config args, %{context: %{current_user: %{id: id}} ->
  {:ok, topic: args.whatever, id: id}
end

Or for a version that treated any logged in user the same

config args, %{context: %{current_user: _}} ->
  {:ok, topic: args.whatever, id: "authenticated"}
end

Then the full subscription id would be "authenticated:#{sha256(variables_json <> query_string)}". Do we need to make the document id be configurable? Are there situations where two documents that don't have the same query string sha should be treated as identical?

darrenclark · 2018-12-14T20:12:34Z

I have a contrived example of what I was hoping to support:

subscription PriceChange(productIds: [ID!]!) {
  priceDidChange(productIds: $ids) {
    product {
      __typename
      id
      name
      price
    }
  }
}

producing a stream of JSON objects something like:

{
  "event": {
     "__typename": "Product",
     "id": "Product:3242",
     "price": 10.5
  }
}

{
  "event": {
     "__typename": "Product",
     "id": "Product:12",
     "price": 11.25
  }
}

{
  "event": {
     "__typename": "Product",
     "id": "Product:924",
     "price": 5.05
  }
}

I was hoping that we'd be able to give all these documents the same ID:

# only whitespace differs
subscription PriceChange(productIds: [ID!]!) {
  priceDidChange(productIds: $productIds) { product { __typename id name price } }
}

# same fields, different order
subscription PriceChange(productIds: [ID!]!) {
  priceDidChange(productIds: $productIds) { product { id __typename price name } }
}

# different operation name / variables
subscription PriceDidChangeSubscription(ids: [ID!]!) {
  priceDidChange(productIds: $ids) { product { __typename id name price } }
}

# fragments
subscription PriceChange(productIds: [ID!]!) {
  priceDidChange(productIds: $ids) { product { ...ProductInfo } }
}
fragment ProductInfo on Product { id __typename price name }

It'd also be nice for us to be able to ignore the arguments for the subscription priceDidChange field (though maybe not all subscription fields in our schema), and rely on the provided topic:s to ensure each client got the data updates they were interested in.

Any thoughts on:

no id: provided, uses existing functionality
id: "authenticated" provided, does "authenticated:#{sha256(variables_json <> query_string)}"
when value provided to id: is a function, simply calls it to get the document ID

darrenclark · 2018-12-14T20:16:54Z

Or rather, it would be nice to have the option to have full control over the IDs being generated, so that we could account for those situations.

I can only really see us accounting for those situations if we see a lot of duplication due to minor changes like those.

benwilson512 · 2018-12-14T20:25:09Z

This gets a little tricky. Notably, these all have completely different blueprint representations, so the current functionality today treats these as all different. And if you have a serializer that respects the order of the fields as queried then they really ARE different. The results of one are not substitutable as the results of the other.

However, Absinthe can sort of punt on these questions I suppose if we also allow a document_id: option that lets the user do whatever they wish:

config args, %{context: %{current_user: _}, document: document} ->
  {:ok, topic: args.whatever, id: "authenticated", document_id: compute_id(document)}
end

Notably, document: isn't actually a key in the resolution struct yet but it pretty easily could be. You'd just pass in the root blueprint struct, nice and simple.

darrenclark · 2018-12-14T20:47:19Z

And if you have a serializer that respects the order of the fields as queried then they really ARE different

Ah yes, good to know (I had no idea they mentioned it in the spec hah)

However, Absinthe can sort of punt on these questions I suppose if we also allow a document_id: option that lets the user do whatever they wish:

The final ID would end up being "#{id}:#{document_id}" then? Users would provide id: and document_id: options?

This would work well for our use case

benwilson512 · 2018-12-14T20:51:04Z

The final ID would end up being "#{id}:#{document_id}" then? Users would provide id: and document_id: options?

Yes, which we'll refer to as the subscription_id. It may make sense to think of the subscription id as the composite of a context_id and a document_id. Having id simpliciter may be confusing.

benwilson512 · 2018-12-14T20:52:27Z

If no document_id is provided Absinthe will just hash the query string + variables. If no context_id is provided we could probably get away with :erlang.unique_integer.

darrenclark · 2018-12-18T15:28:00Z

Cool, sounds good!

darrenclark · 2018-12-18T15:29:01Z

Closing as per #652 (comment)

Fix __typename prevent subscription deduplication

317b5f4

The anonymous function ends being "different" when run on separate processes, preventing identical subscription queries that use __typename from being deduplicated (when run from separate processes)

darrenclark closed this Dec 18, 2018

darrenclark mentioned this pull request Dec 18, 2018

Adjust how subscription IDs are generated #654

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix __typename prevent subscription deduplication #652

Fix __typename prevent subscription deduplication #652

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018 •

edited

Loading

darrenclark commented Dec 14, 2018 •

edited

Loading

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018 •

edited

Loading

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 18, 2018

darrenclark commented Dec 18, 2018

	hash = :crypto.hash(:sha256, :erlang.term_to_binary(blueprint)) \|> Base.encode16()
	doc_id = "__absinthe__:doc:#{hash}"

Fix __typename prevent subscription deduplication #652

Fix __typename prevent subscription deduplication #652

Conversation

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018 • edited Loading

darrenclark commented Dec 14, 2018 • edited Loading

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 14, 2018

benwilson512 commented Dec 14, 2018 • edited Loading

benwilson512 commented Dec 14, 2018

darrenclark commented Dec 18, 2018

darrenclark commented Dec 18, 2018

benwilson512 commented Dec 14, 2018 •

edited

Loading

darrenclark commented Dec 14, 2018 •

edited

Loading

benwilson512 commented Dec 14, 2018 •

edited

Loading