Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document limitations on span recording #3152

Closed
wants to merge 1 commit into from

Conversation

abitrolly
Copy link

Fixes #

Changes

Please provide a brief description of the changes here.

For non-trivial changes, follow the change proposal process and link to the related issue(s) and/or OTEP(s), update the CHANGELOG.md, and also be sure to update spec-compliance-matrix.md if necessary.

Related issues #

Related OTEP(s) #

Copy link
Member

@reyang reyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what "real-time tracking" means.

The wording "impossible" seems inaccurate to me - when a span started, the span processor will be notified, it is just the current exporter and protocol which don't support the concept of having separate data that represents start and stop events (or the SpanEvent which is being added). Maybe something like "it is not currently supported".


### Span Tracking

It is impossible to send incomplete spans, so if the span failed to complete for
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it impossible?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just asked the same question 😆 #3152 (review)

Copy link
Author

@abitrolly abitrolly Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what "real-time tracking" means.

Watching spans in real time. When you have a long running task, and you can see its span and parent span before they are finished. See them the middle of the run. To trace long running jobs.

EDIT: Also to trace long running jobs that were in the end terminated by timeout or in any other way where end of span was not reached, not recorded, not sent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it impossible?

Like @reyang said "it is just the current exporter and protocol which don't support the concept of having separate data that represents start and stop events". Protocol doesn't support sending span with no end attribute. That's why it is impossible to trace spans that were not completed using this protocol.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First, separate concepts for start and stop events are not required for that. I feel like the only limitation is the restrictions placed on the end_time_unix_nano field, i.e. if it's allowed to be 0 to indicate an incomplete span, then no other changes in the protocol are required to support the use case you describe.

Copy link
Member

@reyang reyang Jan 27, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear what "real-time tracking" means.

Watching spans in real time. When you have a long running task, and you can see its span and parent span before they are finished. See them the middle of the run. To trace long running jobs.

EDIT: Also to trace long running jobs that were in the end terminated by timeout or in any other way where end of span was not reached, not recorded, not sent.

Got it, thanks @abitrolly! Maybe this can be a solution https://github.com/open-telemetry/opentelemetry-specification/blob/main/experimental/trace/zpages.md?

This zPage is also useful for debugging latency issues (slow parts of applications), deadlocks and instrumentation problems (running spans that don't end)...

I believe spans are not designed for long running operations (e.g. I don't feel span is the right tool to track a batch job which runs for 5 hours).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro maybe there is a better page to document the OpenTelementry limitations? I thought that protocol encompasses all interactions between tools. I would actually prefer to define actual solution rather than document limitations. For me the good solution is that makes end time optional, not relying on specific value to be set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reyang zPages is a bad alternative, because requires polling over HTTP. Not all long running processes that needs to be traced expose web server. Think CI/CD pipelines for example.

If OpenTelemetry is a replacement for all other tracing protocols, it should support this tracing scenario firsthand (#2930).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@abitrolly I am not sure documenting limitations is a particularly productive exercise, especially in this case where I don't even see it as a limitation but rather a missing feature that simply has not been high on the priority list. I think there is an opportunity here to spec a new feature to support long-running processes better. I am not saying that this is the only way to implement such feature (i.e. you could go with a completely different, event-based protocol, aka streaming implementation of the OTEL API), but in practice sticking with the existing protocol is a much easier path. For example, Jaeger already has ability to receive multiple instances under the same span ID and merge them at query time, but it's probably not completely sufficient for this use case, I would prefer a better definition of the merge semantics and clear spec of the protocol that indicates partial spans.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro if OLTP is going to be labelled stable, this feature won't find its place in the spec, and spec limitations need to be described.

@github-actions
Copy link

github-actions bot commented Feb 6, 2023

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 6, 2023
@github-actions
Copy link

Closed as inactive. Feel free to reopen if this PR is still being worked on.

@github-actions github-actions bot closed this Feb 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants