-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Document limitations on span recording #3152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what "real-time tracking" means.
The wording "impossible" seems inaccurate to me - when a span started, the span processor will be notified, it is just the current exporter and protocol which don't support the concept of having separate data that represents start and stop events (or the SpanEvent which is being added). Maybe something like "it is not currently supported".
|
||
### Span Tracking | ||
|
||
It is impossible to send incomplete spans, so if the span failed to complete for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it impossible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just asked the same question 😆 #3152 (review)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what "real-time tracking" means.
Watching spans in real time. When you have a long running task, and you can see its span and parent span before they are finished. See them the middle of the run. To trace long running jobs.
EDIT: Also to trace long running jobs that were in the end terminated by timeout or in any other way where end of span was not reached, not recorded, not sent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it impossible?
Like @reyang said "it is just the current exporter and protocol which don't support the concept of having separate data that represents start and stop events". Protocol doesn't support sending span with no end attribute. That's why it is impossible to trace spans that were not completed using this protocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
First, separate concepts for start and stop events are not required for that. I feel like the only limitation is the restrictions placed on the end_time_unix_nano
field, i.e. if it's allowed to be 0 to indicate an incomplete span, then no other changes in the protocol are required to support the use case you describe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's unclear what "real-time tracking" means.
Watching spans in real time. When you have a long running task, and you can see its span and parent span before they are finished. See them the middle of the run. To trace long running jobs.
EDIT: Also to trace long running jobs that were in the end terminated by timeout or in any other way where end of span was not reached, not recorded, not sent.
Got it, thanks @abitrolly! Maybe this can be a solution https://github.com/open-telemetry/opentelemetry-specification/blob/main/experimental/trace/zpages.md?
This zPage is also useful for debugging latency issues (slow parts of applications), deadlocks and instrumentation problems (running spans that don't end)...
I believe spans are not designed for long running operations (e.g. I don't feel span is the right tool to track a batch job which runs for 5 hours).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro maybe there is a better page to document the OpenTelementry limitations? I thought that protocol encompasses all interactions between tools. I would actually prefer to define actual solution rather than document limitations. For me the good solution is that makes end time optional, not relying on specific value to be set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@reyang zPages is a bad alternative, because requires polling over HTTP. Not all long running processes that needs to be traced expose web server. Think CI/CD pipelines for example.
If OpenTelemetry is a replacement for all other tracing protocols, it should support this tracing scenario firsthand (#2930).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@abitrolly I am not sure documenting limitations is a particularly productive exercise, especially in this case where I don't even see it as a limitation but rather a missing feature that simply has not been high on the priority list. I think there is an opportunity here to spec a new feature to support long-running processes better. I am not saying that this is the only way to implement such feature (i.e. you could go with a completely different, event-based protocol, aka streaming implementation of the OTEL API), but in practice sticking with the existing protocol is a much easier path. For example, Jaeger already has ability to receive multiple instances under the same span ID and merge them at query time, but it's probably not completely sufficient for this use case, I would prefer a better definition of the merge semantics and clear spec of the protocol that indicates partial spans.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yurishkuro if OLTP is going to be labelled stable, this feature won't find its place in the spec, and spec limitations need to be described.
This PR was marked stale due to lack of activity. It will be closed in 7 days. |
Closed as inactive. Feel free to reopen if this PR is still being worked on. |
Fixes #
Changes
Please provide a brief description of the changes here.
For non-trivial changes, follow the change proposal process and link to the related issue(s) and/or OTEP(s), update the
CHANGELOG.md
, and also be sure to updatespec-compliance-matrix.md
if necessary.Related issues #
Related OTEP(s) #