-
Notifications
You must be signed in to change notification settings - Fork 244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Propose an "Auto-Instrumentation SIG" #87
Conversation
I know that Ted and Sergey are interested in this. Happy to seed the group with other folks as well.
I was going to propose a |
Meta-note: I didn't create the gitter room yet (wanted to wait for approval on the PR and naming). As for having cross-language stuff happen in the TC: I guess it just comes down to whether it's the same group of people and should be the same set of docs/meetings. I was imagining this being slightly less "central" in that we wouldn't be meddling with APIs or OpenTel data formats in this SIG and thus it can be decoupled. Happy to have that debate, though. |
Trying to move this forward... @open-telemetry/technical-committee: I thought it worthwhile to have a cross-language SIG for the auto-instrumentation agents because there are a lot of different ways to think about them "philosophically," and we will drive ourselves a bit crazy if the different languages adopt different approaches entirely. E.g., should these agents mainly be "installers" that take existing OpenTelemetry instrumentation and bind it to the running process? Or should they do black-box instrumentation of libraries using other techniques? Should they be configurable "from the outside" to grab function parameters/etc? How pluggable should they be, and what sort of plugin architecture makes sense given the rest of OpenTelemetry? Should they aim to make portable OpenTel API calls, or should they aim to simply emit OpenTel wire formats? Or both? Also, agents are tricky to get right and we might want different folks participating in the meetings than the regular TC members who may or may not have the relevant expertise. Separately, I'd love to make a decision about how we're moving forward in the next 48h or so... the more I think about this project, the more I think these auto-instrumentation agents are important for the end-to-end value prop of OpenTelemetry as a "product". Thanks! |
@bhs This is a good observation. I believe agents are very important for out-of-the box product experience. Even if that does not necessarily mean auto-instrumentation of user's apps, once you deploy the agents you can expect them to start collecting host metrics which is already valuable.
I'd be happy to participate (I've been the tech lead for LogInsight Agent in the past - not a trace/metric collector, but still hopefully useful experience). |
Are you thinking about a monkey-patching technique that doesn't touch the application code, e.g. assuming the app is using a dynamic library swap the library with equivalent but instrumented version? Not sure how practical this approach would be.
For LI Agent we supported remote configuration, i.e. you would define the config on the server/backend, the agent would connect to the server and pull the config (for security reasons this was controllable from the agent/host side and could be enabled/disabled). The downside is it increases the security surface significantly and may not be neccessary for infrastructures where bulk configuration deployment is naturally supported via other means.
I was thinking about compile-time plugging. You would have a "core" that would have the base functionality and include certain receivers and exporters (e.g. OpenTel/OpenCensus formats, probably some others). If you (as an end user) wanted to add your own receiver you can easily build on top of the "core" by creating your own "agent" that simply imports that "core" agent and registers your own receiver/exporter factories before starting the core. You would then build and deploy your custom agent similarly to the standard one. (see some related thoughts here open-telemetry/opentelemetry-collector#12 and general extensibility vision here https://github.com/open-telemetry/opentelemetry-service/blob/master/docs/VISION.md)
I think they should emit the wire format specified by the config file (OpenTel being the default), which is what OpenCensus agent is doing now. |
|
||
"Auto-Instrumentation" refers to efforts to install OpenTelemetry instrumentation and otherwise extract OpenTelemetry-compatible data from processes without direct code modification. The Auto-Instrumentation SIG will meet weekly at a time TBD. | ||
|
||
You can also join us on [the auto-instrumentation channel](https://gitter.im/open-telemetry/auto-instrumentation) in OpenTelemetry gitter. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The channel seems to be private, doesn't load for me. Is it invite-only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mentioned this above. I haven't created the channel yet... I will do so if this PR is approved.
@tigrannajaryan this is exactly the sort of discussion I would like to have in the actual SIG, but not in the PR talking about whether we create the SIG. :) The larger point I'm making is that many/most of these decisions are not language-specific, and so we should have a central place where we determine the spec and go forward from there. Otherwise we will have the same debate over and over again in the N languages, and/or we will end up with divergent models across the N languages. The decision we're trying to make right now: should we create a single SIG to determine the spec / "ground rules" for the various auto-instrumentation efforts across the N languages? If so, we'll approve this PR and get the right people involved to formalize that spec. |
@bhs Makes sense. |
@bhs can you please clarify what would be the relation of this new SIG with the already existing SIG for Agent/Collector that is listed here? https://github.com/open-telemetry/community#agentcollector Is this the same SIG or a new one? |
I am conflicted about this. (Blackbox) auto-instrumentation feels like an area even less explored than whitebox instrumentation. That is to say, even though commercial vendors have been doing it for years, I haven't seen a lot of information published publicly about how they are doing it. It's possible they have found common cross-language patterns (another question whether they are willing to share them). But it's also possible that there are many different ways of doing that. So my concern is with the goal of this SIG to "create a specification". If, on the other hand, the goal is to discuss these cross-language patterns and concerns, and maybe produce a white paper / recommendation, then that would be great. If a formally formed SIG helps people to do that, I am all in favor. |
@tigrannajaryan agent/collector are backend components, they receive but don't produce telemetry. |
@yurishkuro that's correct, but nothing prevents agent to produce host-level telemetry (metrics). I believe it will be very useful. If eventually we add support for logs the agent can also monitor syslog, journald and /var/log and collect/send the logs to the backend with no instrumentation needed. This way the agent becomes a producer of very valuable telemetry. |
But not the telemetry from a given application. I believe this SIG is explicitly focused on blackbox instrumentation of applications, which has unique challenges compared to simply collecting host-level telemetry from sources that already produce it. But I'll let @bhs to respond. |
Not exactly, actually... the model I am personally the most excited about tends more towards something like https://github.com/opentracing-contrib/java-specialagent . I.e., it's possible to be agent-like in that there are zero source-code modifications, but still rely heavily on whitebox instrumentation where it's available. Of course it's possible to mix and match these two approaches, at least to a degree. In the Anyway, these are yet more of the things we would discuss in the SIG. :) I just want to be clear that "auto-instrumentation" and "whitebox instrumentation" are not mutually exclusive. |
Does this need to be a separate SIG? The auto-instrumentation will differ from language to language and most likely it will be maintained by people working on a specific language. |
We would be willing to contribute to a SIG. Obviously, we have been using agents forever and think they make a lot of sense. If the SIG should be successful we need language/runtime providers in there as well. Usually, a lot of functionality requires specific features from the runtime like agent loading, code/binary loading interception hooks. For the SIG I would propose to define what we want to work on. We are also interested in a well-definend coexistence scenario between special agents and auto-instrumentation with code-based instrumentation |
Hello,
Based on OpenCensus, we're currently building a Java agent for the purpose of automatically injecting instrumentations into a blackbox system (inspectIT Ocelot). Besides of this, we have been developing Java agents for some years now, thus, we have quite a lot of experience in this topic and are interested in contributing our experience. We are also seeing some points here that we have also discussed in our team and tackled in Ocelot, like the point mentioned before of using an "agent approach" in combination with "whitebox instrumentation". |
I realize we want all languages to have some way of code injection. I'd suggest, hovewer, to start with Java. Just to scope it down. And later we can generalize it to the level of a cross-language discussion. Is anybody on this thread interested in code injection and not interested in Java? If this is fine, I think it's a good idea to kick this SIG off. @bhs any reason to close it in 48 hours as you mentioned? It is really important to start API SIGs now and there are clearly people who will participate in both. Will next week be a good time for a next meeting or there are some pressing factors? |
Hi! I'd love to participate in this, have been instrumenting Java for a long time (https://github.com/glowroot/glowroot), and recently started (also) working on Microsoft's Java agent. |
I think that it makes a lot of sense to have a SIG for automatic instrumentation, starting with Java. While I'd argue that this functionality should be packaged with the existing sidecar functionality that we're porting over from OpenCensus (so that users download a single binary), the two sets of functionality should be developed in separate workstreams. Thus we'd have a SIG for auto instrumentation (starting with Java) and a SIG for sidecars (existing OC agent and collector). Thoughts? |
@SergeyKanzhelev re closing this PR, I wasn't clear on what you're suggesting... that we close this PR (without merging) and proceed to have the auto-instrumentation discussion in the context of the existing Java SIG? Or that we merge this PR, create the SIG, then start with Java? Sorry to be unclear.
I primarily want the SIG to write down a set of constraints/goals for "official" OpenTelemetry auto-instrumentation efforts (I'm trying to avoid the word "agent" since OpenCensus has an agent that's a completely different thing – more like a sidecar), then to prioritize and help organize the various per-language efforts. I agree with numerous people here that we will need language+runtime expertise on a per-language basis before actually writing code. The cross-language SIG would not be involved in this conversation unless cross-language patterns emerge. @mtwo re your comment:
Fine with me, sure. I don't really understand the "packaging" comment, though... they really have different purposes. I see the layering as |
@bhs I was suggesting to have java auto-instrumentation SIG and generalize it later in cross-language auto-instrumentation. Unless we have a large group of people interested in different language now. I created a poll for the kick off meeting: https://doodle.com/poll/f9egdg3n2tfy24kg for the next week. I didn't create any late evening options. Please advice if it is needed. |
Nevermind, I was thinking that we could distribute the auto-instrumentation functionality as a part of the OC agent / sidecar. However this isn't feasible if it's being passed as a javaagent param to the JVM. Ignore that part of my comment :) |
@SergeyKanzhelev it may be risky to start by immediately digging into Java, as there are many parties (even just on this thread) who already have some sort of Java agent which they are inevitably – and understandably – somewhat attached to... that, in turn, can lead to a scenario where participants end up creating rationalizations for their own approach rather than thinking about what really makes the most sense for OpenTelemetry as a project. My hope in starting with a cross-language spec is that we could establish what some of those more strategic goals are before digging into a bake-off of N different existing OSS Java agent projects. Yet another approach: I can write up that high-level spec about the goals as just a plain-old document in one of the OpenTel repos, and we can debate this stuff on that PR. Once we have alignment around the goals, we could dig in to the Java stuff with a clearer sense of our agreed-upon objectives. But there would be no cross-language "agent" or "auto-instrumentation" SIG, just the spec doc. |
I'm sure the initial meeting will be Java-centric, given the shape of the community, but I would prefer that we discuss this topic at a higher level, and start from a cross-language perspective. I believe it's possible to factor out the issue of auto-intrumentation into individual problems, and discuss what they would mean for a project like OpenTelemetry. "Agent" is simultaneously an overly broad and overly specified term, so it would be helpful to understand our goals before launching into implementations. Moreso than the APIs – where we were trying to merge two existing projects – the agent issue could use a proper design process, starting with a gathering of requirements. :) |
I agree @tedsuo, the higher level discussions would let us define a clear scope for "auto-instrumentation" on different platforms. The way "auto-instrumentation" is done in Java is very specific to Java, and would not necessarily translate to other platforms (even from an architectural level). I'll join the kick-off meeting next week. |
Alright... so, in re this PR, I think we should leave it open until we've had the initial call that Sergey proposed. The self-scheduling link that Sergey created is here: https://doodle.com/poll/f9egdg3n2tfy24kg I also created a basic agenda doc that we can use for the first meeting we're scheduling above ^^. Please add or suggest edits as anyone sees fit, keeping in mind that we shouldn't (IMO) dig deeply into technical minutia on the first call: |
Just to chime in here... I think it would be valuable to have a preliminary cross-language group to establish the goals and architecture before diving in to the separate language implementations. Some things that might make sense for the cross-language discussion and for each language group to consider:
|
Out of all voted - tomorrow 1PM-2PM pacific works for everybody. Scheduled: |
@SergeyKanzhelev Should we standardize on a video conferencing tool? I feel it makes meetings easier when a different tool is not used every time since you can focus on the meeting and not on learning the tool. Either Zoom or Hangouts would be the most commonly used in general. |
@rochdev I don't have access to create either Zoom or Hangout meetings =). I was planning to follow up with CNCF on using their Zoom subscription. I'm OK with Zoom. |
Would dynamic instrumentation fall into this category? As in, on a running node, or set of nodes, define points for new spans to be created and finished. Because this is a common practice with Erlang's various tracers to do, even in production, for real time investigating, I figured being able to also say, 'include these in any OpenTelemetry traces that come through these code paths as well, could be beneficial. |
I'm adding this meeting (copying Sergey's link) to the public calendar |
Earlier this week, many folks on this thread had our discussion about “auto-instrumentation” / “agents” / “zero-source-code-modification instrumentation” (these are all the same thing, just with different words). Those on the call thought it would be helpful to try to document a list of requirements we could use to help make our efforts as consistent as possible across languages… eventually we’d like this to be a PR, but for now the consensus is that a google doc will be easier to iterate on. Anyway, here’s a first stab at it: https://docs.google.com/document/d/1sovSQIGdxXtsauxUNp4qUMEIJZzObdukzPT52eyPCHM/edit#heading=h.obofcqujudb8 To be clear, this is a work-in-progress proposal/draft and I’m 100% open to feedback about any of it. Thanks in advance! |
Just a final ping on this thread to see if anyone wants to weigh in on https://docs.google.com/document/d/1sovSQIGdxXtsauxUNp4qUMEIJZzObdukzPT52eyPCHM/edit#heading=h.obofcqujudb8 before I turn it into a PR... def easier to resolve comments in a google doc than a GitHub PR, so please do make any suggestions/etc sooner rather than later. Thanks. |
Missed the message from five days ago - taking a look now! |
LGTM |
My concern specifically is about overloading the term "automatic instrumentation" - there's a difference between "full manual creation of trace spans", "link in this library as a dependency and everything will automatically work", and "no code change needed and things will automatically work". The latter two I'd say are both kinds of "automatic" instrumentation. Can we be clear that this SIG pertains specifically to the bytecode approach rather than encompassing all "automatic instrumentation"? (this discussion now ongoing in both open-telemetry/oteps#5 and open-telemetry/oteps#7) |
I agree, that's why I think they belong to the same RFC. In fact, open-telemetry/oteps#5 is called "zero-touch" in quotes. I think it's wording should be relaxed slightly -- comment. |
Given that https://github.com/open-telemetry/rfcs/blob/master/0002-telemetry-without-manual-instrumentation.md is merged, I'm inclined to close this issue... the main reason I initially wanted a SIG was to create some cross-language requirements, and that's done. Are there people who still want a cross-language auto-instrumentation SIG at this point? If not, I will close this by the end of the week. |
🏏 🏏 🏏 🏏 (closing the PR) |
I know that @tedsuo and @SergeyKanzhelev are interested in this. Happy to seed the group with other folks as well.
I'm trying to avoid the word "Agent" since I know it has different meanings to different people. For what it's worth, the first order of business for the SIG would be to define the terms more crisply, but I'm basically imagining software that's linked in post-compilation and provides portable OpenTelemetry-compatible instrumentation; ideally in a clean, pluggable, well-factored manner, though all in due time. :)
Also, I'm happy to find a name other than "auto-instrumentation" as long as it's clear / unambiguous.