-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Define semantic conventions and instrumentation stability #2180
Define semantic conventions and instrumentation stability #2180
Conversation
db79cb2
to
9fb4cf3
Compare
16d1b3b
to
e69ac8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ADDITIVE changes to metrics can break user alerts by creating new timeseries and adjusting thresholds. I think we need to be a lot more careful around changes, particularly around metrics here.
@jsuereth I can change the PR to explicitly prohibit this but I think what you suggest contradicts with an existing understanding that attributes are optional, see this discussion #2077 I don't know which one is correct, I'll let metric expert to decide and then I am happy to adjust this PR according to that. |
@jsuereth I modified the PR to disallow addition of new attributes to existing metrics. PTAL. |
a1545b5
to
ba788d6
Compare
Changed the PR to avoid the blocking issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for resolving my blocking comment! I still have some concerns, but not as strongly blocking :)
For clarity - I agree with the sentiment and goal of this PR and approve, just nits on details/scope of what's stated here, including implications on instrumentation development.
Can you clarify how this applies to auto-instrumentation distributions? Should we hide all telemetry producing auto-instrumentation in a distribution behind an experimental flag, and only enable them by default once their respective semantic conventions have reached stability and we can commit to telemetry stability? |
No, I believe it is fine to produce telemetry from auto-instrumentation. This PR only says that after you declare your auto-instrumentation "Stable" you cannot change the produced telemetry until certain conditions are met (SchemaURL in the output and wait until certain date). If the text in the PR is unclear I will fix it. Let me know which part you find unclear. |
For individual instrumentations, we use the version number to "clearly mark" them as unstable, e.g. For distributions (e.g. the opentelemetry javaagent), should we be marking them as "unstable" (e.g. |
923dfd9
to
7bf6381
Compare
The "unstable" vs "stable" labeling of a library or of a package (or individual instrumentations if that's the granularity of labeling you choose to maintain) is done based on the fact whether you are able to take the responsibility to fulfill the requirements of the "stable" labeling. What "stable" label means is described in this document. Particularly with regards to telemetry emitted by the instrumentation this document (after accepting this PR) says that if you label the instrumentation "stable" then you must fulfil the following obligation:
If you are not labeling the instrumentation as "stable" then you free to do whatever you want. Any changes are fine. |
Thanks @tigrannajaryan. Given that we don't want to mark the entire javaagent distribution as "unstable", it sounds like we will need to disable all "unstable" auto-instrumentation in the distribution by default, and only enable them (by default) once their respective semantic conventions have reached stability and we can commit to the telemetry stability described in this document. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a number of questions about metrics here. The Views support in each metrics SDK should show it's possible to perform attribute removal safely (correctly, and relatively easily compared with doing so after-the-fact). This topic came up in #2208. I expect we will eventually be able to support metrics schema translation via Views and relax these restrictions, but it's good to be conservative for now.
@Oberon00 please take another look. |
@open-telemetry/python-maintainers can you please take a look? This affects instrumentations in all languages. It would be good to have your approval if you agree with this approach. |
@dyladan if the PR looks good to you please approve on behalf of JS SIG. I want to have an approval from each (or at least most) language SIG. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this at the JS meeting last wednesday and the other @open-telemetry/javascript-maintainers brought up the same points as have been brought up by myself, @alanwest, and others around instrumentation which is not in the semantic conventions. There will always be some instrumentations which for whatever reason are not covered by the semantic conventions whether they have too small of a userbase, are too specific, or provide some custom functionality. This PR seems to prevent those instrumentations from ever being marked as "stable" which will limit their adoption.
From a maintenance standpoint it is not too big of a burden for maintainers to gate stable on the guarantees outlined here, but I believe the restrictions will prevent people from upstreaming contributions to the contrib repo unless we have some way to provide custom functionality that isn't covered by the semantic conventions.
As mentioned earlier, this is still marked experimental so isn't blocking, but I want to make sure the point isn't forgotten before this is finalized and made stable.
Make sense. I will think about we can make this explicitly allowed. To be clear, even in this current form we do not completely prevent Stable implementations which break the rules, since I used the "SHOULD" clause, so there is always a possibility to ignore it if there is a good reason. Anyway, I will file a github issue to take care of this, if this PR gets merged (I am not yet sure, so I don't want to open an issue about a document that doesn't exist in the spec yet). |
I believe so far we have approvals from JS, Go, Java, .Net maintainers. |
@tigrannajaryan I thought there were two options for instrumentation that are not covered by semantic conventions to go stable:
|
@trask that is correct. And for both options they should also follow this rule:
I think @dyladan's concern is that this is too much limitation and the rules are too rigid for instrumentations contributed by non-core maintainers. |
oh, I missed this, thx! I agree this is too limiting for the reason @dyladan mentions:
|
Summary of changes: - Explain that schemas define what changes are allowed. - Clarify additive changes are an exception and are always allowed. - Prohibit instrumentation changes until July 1, 2022 to allow recipients (backends/vendors) to start understanding Schema URLs/Schema Files.
7a7a37b
to
89ead2d
Compare
In addition we now have approvals from Ruby and Python maintainers. I believe this is good enough and we can move forward with this approach. Merging the PR. |
…etry#2180) This PR defines the rules for SDKs in all languages with regards to marking instrumentations as "Stable" and "Unstable" and how such instrumentations are allowed to evolve over time. Summary of changes: - Explain that schemas define what changes are allowed. - Clarify that additive changes are an exception and are always allowed. - Prohibit instrumentation changes until April 1, 2023 to allow recipients (backends/vendors) to start understanding Schema URLs/Schema Files. This gives one year advance notice, which I think should be enough for all parties to prepare.
This PR defines the rules for SDKs in all languages with regards to marking
instrumentations as "Stable" and "Unstable" and how such instrumentations
are allowed to evolve over time.
Summary of changes:
(backends/vendors) to start understanding Schema URLs/Schema Files.
This gives one year advance notice, which I think should be enough for
all parties to prepare.