Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlate EventPipe events with trace spans #3313

Open
kolesnikovae opened this issue Aug 24, 2022 · 8 comments
Open

Correlate EventPipe events with trace spans #3313

kolesnikovae opened this issue Aug 24, 2022 · 8 comments
Labels
enhancement New feature or request
Milestone

Comments

@kolesnikovae
Copy link

Hi everyone! Thank you for the awesome ecosystem and tooling you are developing.

We are building Pyroscope, an open source continuous profiling platform. Our .NET agent handles the EventPipe stream from the profiled application through the diagnostic IPC in order to collect a CPU profile, and this works well in a simple scenario where the profile is collected for the entire application/process. We are currently working on an integration with distributed tracing, which requires us to distinguish various execution scopes more granularly, for example scopes of HTTP requests. Go and Java applications are already supported and we'd love to add support for .NET applications – this thread may give more context.

I'm wondering what is the best way to reliably correlate Microsoft-DotNETCore-SampleProfiler events with OpenTelemetry trace spans. For example, in Go there are pprof tags, in Java JFR one can use execution context ID (ECID), etc. Basically, we're looking for a way to annotate EventPipe events with an arbitrary set of tags/labels.

Any questions and suggestions are greatly appreciated!

/cc @Rperry2174 @petethepig

@cijothomas
Copy link

@noahfalk maybe something you can help or redirect...?

@noahfalk
Copy link
Member

cc @davmason

Hi there Pyroscope crew! I took a look and I think we've got some options but fair warning I don't see one that is simultaneously easy to implement, high performance, and available to ship soon. I'd love for .NET to support this but we'll probably have to pick our poison on how we get there.

  1. In an ideal world I think the runtime would have something built-in where you could set some keywords or filter arguments on the sampling profiler and then all the sampling events would be generated with the current span id as part of the event payload. We could collaborate to build something like this for a future release of .NET, but unfortunately its too late to get new features into .NET 7. I'm assuming your goal is build something earlier than .NET 8 so I'll leave this aside for the moment.

  2. Another option that uses the existing EventPipe sampling events would require the app to emit sufficient other events that you can track what time intervals each thread was processing a given span. For apps where threads switch between spans infrequently (say a few thousand switches/sec) this might work OK, but as the number of switches/sec increases the performance overhead of this approach could get quite high. In terms of how to generate events that would let you track those intervals two possibilities I see:

  • .NET 7 added a managed callback whenever Activity.Current changes and you could log an event there.
  • .NET 5 added Activity start/stop EventPipe events that you could enable via DiagnosticSourceEventSource. Then you would also need to enable Tpl events to observe when work for handling a given span is switching between threads.
    The CurrentChanged callback is conceptually simpler and has fewer events in total, but it would only work starting in .NET 7 and requires you to have at least a little managed code running in process to be able to hook that event callback.
  1. If you wanted something that works in .NET 7 and has higher performance at scale we'd probably have to move into ICorProfiler territory. If you haven't worked with that before you are writing a C++ dll that loads inside the runtime and you get access to some low level interfaces. Once you've got your dll in there I think you would need to do two things:
  • Create an alternate implementation of the cpu sampling that will log additional state with each sample. I don't think this is hard as it may sound at first because the profiling API has methods to suspend/resume the runtime, enumerate all the threads, walk a thread's stack, and emit EventPipe events. You would be writing the glue that stitches those capabilities together.
  • Capture the span id for each thread in memory whenever it changes so that you have it available to include in the cpu sampling events. We could do this by using the Activity.CurrentChanged callback in .NET 7 to record any information you like about the current span into some memory where the profiler will be able to retrieve it later.

This is just an overview and you probably have plenty of questions. We're happy to chat and dig into some of these deeper. Sorry there isn't an easy button on this one right now, but I hope there can be in the future once we blaze the trail.

@kolesnikovae
Copy link
Author

Thank you for the exhaustive answer @noahfalk. I think we could start with Activity start/stop + TPL EventPipe events – thank you for pointing out this approach. However, as you said, implementing our own low level profiler (ICorProfilerCallback) might be a better move from a longer term perspective. We will carefully consider the options you listed and find out which one suits us best. Thank you for your help, once again!

@davmason
Copy link
Member

It sounds like this is resolved, closing this issue. If there is outstanding work I missed please reopen it.

@noahfalk
Copy link
Member

@davmason - I think most of the alternatives I suggested above are complex for profilers to implement, don't scale well, or both. I'm thinking we should have some issue (although it doesn't have to be this one) tracking a feature request for the runtime implementation of option (1) in the list above.

@davmason
Copy link
Member

I see I didn't actually manage to close it, we can leave this issue open to track option 1

@hoyosjs hoyosjs added the enhancement New feature or request label Oct 24, 2022
@tommcdon tommcdon added this to the 8.0.0 milestone Oct 25, 2022
@martinjt
Copy link

martinjt commented Feb 29, 2024

This has gone a bit quiet, so I'm wondering if there's been any work (thoughts or code) about getting this in?

There isn't an SDK spec coming out of OTel yet, but the data model is progressing, so I'm looking to understand whether .NET is going to work to that rather than creating its own implementation?

/cc @noahfalk

@noahfalk
Copy link
Member

noahfalk commented Mar 5, 2024

I'm still interested in improvements in this general area, but nothing has pushed it high on the priority list. If Honeycomb, Pyroscope, or anybody else has profiling features they would implement soon but can't do it because of lacking runtime support that would be valuable feedback.

@martinjt, I assume you are talking about the data model for profiling? At the moment the runtime team hasn't made any direct plans around it and it appeared that OpenTelemetry folks were well underway creating a profiler that emitted the data based on the ICorProfiler APIs the runtime already supports. Does that feel like a satisfying path or you are hoping to see something different?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

7 participants