Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Plugin telemetry events #894

Merged
merged 10 commits into from
Apr 6, 2020
Merged

Conversation

akoutmos
Copy link
Contributor

This PR addresses some of my thoughts that I raised in #892 and emits start+stop events for dataloader events where there are pending_batches to process. Those two additional Telemetry events are:
[:absinthe, :dataloader, :resolve, :start] when a dataloader with pending batches starts
[:absinthe, :dataloader, :resolve, :stop] when a dataloader with pending batches finishes

This is just an initial stab at the problem, so feel free to comment. Thanks :)!

@benwilson512
Copy link
Contributor

Hey @akoutmos thanks for taking a stab at this!

Here's the challenge we need to solve: Dataloader is an optional dependency within Absinthe. I don't think it's appropriate for Absinthe to emit dataloader specific events. I do think that the resolution phase should emit events for when a plugin callback has started, and when a plugin callback has ended though, and the plugin module would be part of the event emitted.

For Dataloader specific events, I think we need to have those fire within Dataloader itself.


emit_start_event(start_time, exec, loader)
updated_loader = Dataloader.run(loader)
emit_stop_event(start_time, exec, loader)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, I think we should move these start and end events into https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/phase/document/execution/resolution.ex#L78 and it would be something like:

  defp run_callbacks(plugins, callback, acc, true) do
    Enum.reduce(plugins, acc, fn plugin, acc ->
      start_time = System.monotonic_time()
              :telemetry.execute(
          [:absinthe, :plugin, callback, :start]
          %{start_time: start_time},
          %{plugin: plugin, acc: acc}
        )
      acc = apply(plugin, callback, [acc])
      # stop event
     acc
    end)
  end

@akoutmos
Copy link
Contributor Author

@benwilson512 I think my latest commits address the comments that you brought up. I really like this approach compared to my original solution as it is more general purpose now and covers all the plugins. On that same note, is a dataloader specific event even needed? You can match on

{
  :telemetry_event, 
  [:absinthe, :plugin, :callback, :start], 
  %{start_time: _}, 
  %{loader:Absinthe.Middleware.Dataloader}
}

to get your dataloader (or any plugin for that matter) specific events. Thoughts?

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvements!

Enum.reduce(plugins, acc, fn plugin, acc ->
start_time = System.monotonic_time()

emit_start_event(start_time, acc, plugin)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very close! So, callback here is either the atom :before_resolution or :after_resolution, and I was thinking we could use that to dispatch more finely tuned events. Specifically this would get us:

[:absinthe, :plugin, :before_resolution, :start]
[:absinthe, :plugin, :before_resolution, :stop]
[:absinthe, :plugin, :after_resolution, :start]
[:absinthe, :plugin, :after_resolution, :stop]

:telemetry.execute(
@callback_stop_event,
%{duration: System.monotonic_time() - start_time},
%{acc: acc, loader: loader}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The loader variable here is I think the plugin right? We probably want to call it plugin: plugin

@akoutmos
Copy link
Contributor Author

My bad. I read your callback line as an atom and not the function variable. Third time is a charm I think I addressed all your feedback :D. Thoughts on the separate dataloader telemetry events? Thanks again!!

Copy link
Contributor

@benwilson512 benwilson512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good to me!

Thoughts on the separate dataloader telemetry events?

Let's do these within the Dataloader project itself: https://github.com/absinthe-graphql/dataloader. PR welcome. We probably want to emit events for each run_batch call basically.

@binaryseed you did the original telemetry stuff, this look good to you?

@benwilson512 benwilson512 requested a review from binaryseed March 29, 2020 21:26
Copy link
Contributor

@binaryseed binaryseed left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My only concern would be how frequently these will happen. With field-level events, we made sure that the telemetry only triggered when a resolver function was explicitly called. We need to make sure that this instrumentation won't cause there to be a huge increase in calls to execute, since they are not free (an ETS lookup so cheap, but not free)

lib/absinthe/phase/document/execution/resolution.ex Outdated Show resolved Hide resolved
# This test emits a total of 24 events, but there we are only validating the
# dataloader events in this test
results =
Enum.reduce(1..24, %{dataloader_starts: 0, dataloader_stops: 0}, fn _, acc ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unclear why the reduce is needed, the other telemetry tests managed to work just fine with assert_receive...

assert_receive {[:absinthe, :execute, :operation, :start], _, %{id: id}, _config}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the reduce is so that I can count only the specific events that I want to fetch. Since the telemetry events emit on all plugins, I was also getting Async and Batch events. But only wanted to assert on the Dataloader specific events.

test/absinthe/middleware/dataloader_test.exs Outdated Show resolved Hide resolved
@binaryseed
Copy link
Contributor

Also, we have a pattern of sticking an id on each event to correlate the start & stop...

id = :erlang.unique_integer()

@benwilson512 benwilson512 changed the title Dataloader telemetry events Plugin telemetry events Mar 30, 2020
@benwilson512
Copy link
Contributor

Is this ready for re-review?

@akoutmos
Copy link
Contributor Author

I think I addressed everything on my end. The only open question from what I gather is the number of events being generated...but I defer to you guys on that one.

@benwilson512 benwilson512 requested a review from binaryseed March 31, 2020 16:38
@binaryseed
Copy link
Contributor

So, I looked into this locally on my API, and the problem is real.

We've got a few plugins installed for standard stuff like auth & monitoring, and this addition calls execute 2 times for every plugin for every phase of resolution.

That means when using stuff like Async / Batch, every new phase added by any plugin re-triggers a whole new set of telemetry events for every plugin.

For example, I tried it out with a really simple query, and there were 6 resolution phases. Each phase added 14 calls to execute (we have 7 plugins). That means a total of 84 additional calls to telemetry.execute - 84 more ETS lookups. And that's as a no-op, if there's a handler attached, then it's going to get called 84 times.

I'm a little worried about this... It's pretty unlikely that the raw data here is going to be leveraged, and that's a good bit of work to do for something that's rarely going to be used.

I"m wondering if directly instrumenting Dataloader would be a better way to go about this vs instrumenting every single plugin interaction?

@benwilson512
Copy link
Contributor

@binaryseed This is a good point, I didn't think about the fact that this will run N*M times, where N is the number of plugins, and M is the number of resolution phases that happen (roughly proportional to the depth of the GraphQL query). :ets is pretty fast, but I agree that there isn't a lot of point to firing events about plugins that don't execute.

Perhaps an alternative strategy would be to emit events only within the functions within the batch and dataloader plugins that actually execute user code so that you can track how long a given batch or run_batch call takes if any are queued up without firing events at all if none are present. I realize this is closer to what you had originally @akoutmos and I apologize for the back and forth here. In this proposal to be clear, we would move the event hooks to here: https://github.com/absinthe-graphql/absinthe/blob/master/lib/absinthe/middleware/batch.ex#L156 and within Dataloader itself. @binaryseed does that seem more reasonable?

@akoutmos
Copy link
Contributor Author

akoutmos commented Apr 2, 2020

No worries at all @benwilson512! That sounds good to me in regards to moving the telemetry events to the plugins directly. If that sounds good to you as well @binaryseed then I can get going on this approach and update this PR + open up a new one for the dataloader repo.

@binaryseed
Copy link
Contributor

Instrumenting call_batch_fun sounds pretty good!

@benwilson512
Copy link
Contributor

@akoutmos do note the naming conventions mentioned here: #901 (review)

@akoutmos
Copy link
Contributor Author

akoutmos commented Apr 4, 2020

Working on cleaning up this PR to reflect the conversation that we had. But managed to wrap up the Dataloader changes: absinthe-graphql/dataloader#89

Will update this PR over the weekend. Thanks!

@akoutmos
Copy link
Contributor Author

akoutmos commented Apr 4, 2020

I think I addressed everything that we discussed and also updated the documentation.

Question for you guys. Should I remove the telemetry tests in the test/absinthe/middleware/dataloader_test.exs file? The emitting of telemetry events is covered in the Dataloader PR that I made...but it would also be nice to ensure that those events are captured in an integration style test within Absinthe. On the flip side, the tests in Absinthe will fail until a newer version of Dataloader is published and the deps are updated in Absinthe. Thoughts?

@binaryseed
Copy link
Contributor

Yeah, I'd much rather we keep the tests for dataloader telemetry events inside the library itself...

@akoutmos
Copy link
Contributor Author

akoutmos commented Apr 5, 2020

I went ahead and removed the test @binaryseed

@binaryseed
Copy link
Contributor

Thanks! I'm going to merge this and followup with a small commit to move the telemetry.execute call in the start case to execute after the task is spawned so that handlers don't impact the timing!

@binaryseed binaryseed merged commit 37311f9 into absinthe-graphql:master Apr 6, 2020
@binaryseed
Copy link
Contributor

Here it is: 38016db

@akoutmos
Copy link
Contributor Author

akoutmos commented Apr 7, 2020

Thanks @binaryseed! Let me know what you think of the dataloader changes and if you require any changes there absinthe-graphql/dataloader#89.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants