-
Notifications
You must be signed in to change notification settings - Fork 896
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Structured Stacktrace in Span Attributes/Logs #2839
Comments
I forgot that trace and span attribute values can only be primitive values and arrays of primitive values, so not arrays of maps: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute So closing this. I really would like to see this incorporated somehow, pls comment on this issue and I can reopen it if there are viable alternative suggestions. |
I'm reopening this because (1) it's completely feasible to record structured information with the current attributes. Just convert from an "array of structs" to a "struct of arrays", i.e. instead of having one array with objects that have three properties each, have three arrays stack.functions, stack,linenos, stack.filenames. And (2) I think this would without doubt be a useful feature (and having unresolved dependencies for the issue would not a reason to close it). |
Hm, ok, so something like:
I agree structured stacktraces are too useful to not continue looking into how to support them but thought it wouldn't be feasible in attributes, which this issue was about, but would need a message type. Your suggestion definitely changes that. Now that it is also a form I can easily make a PR for because I think the tools support defining a type of array of strings I can go do that as well. |
I personally think we should just use a plain string and provide some example/suggestion/guidance on the format so people know how to parse the common patterns. I understand that having more structure would reduce parsing cost, I'll explain the reason below: Here is my experience (I'm the owner of several Microsoft internal stacktrace schemas) - trying to define a cross language/runtime structured stacktrace is a very hard problem, we'll always find cases that don't fit (e.g. what about Java calling into C++ via JNI and going back via reverse-JNI, do we want to distinguish kernel frames and user frames, do we want to show user friendly frames or the real frames which might have thunks, do we use fully-qualified function/type name or short names, do we have path+filename or fullpath and whether it is a local path or the repo path, what do we do with AggregatedExceptions or Exception which has inner exceptions). From what I've seen in the past 15+ years, all the efforts trying to define highly-structured cross-language callstack ended up with folks trying to fit new scenarios into the established structured in very weird ways, and they will always have to do extra parsing in the backend. |
I second @reyang's opinion, speaking as the former maintainer of a multi-language stacktrace parsing library (part of Google's offering in this space) The first time we discussed adding a stacktrace semantic convention to OTel was over two years ago: We already have a standard data type for stacktraces. While programmers tend to see the advantage of structuring these at runtime, they also have a bias toward action and a limited perspective. Clearly, their ability to debug a situation would improve if they would just take advantage of runtime libraries to self-introspect the stack. However, this comes at the cost of performance, safety, and likely correctness. The language runtime's ability to generate a stacktrace string is guaranteed to exceed the program's ability to do so using structured APIs. The use of structured stacks is costly because you will generate many small objects. The use of structured stacks is unsafe. For example, in the JDK PrintStacktrace() method there is a special anti-recursion feature built in, the comment reads:
The runtime library likely has special advantages in computing these, for example the Golang runtime stacktrace includes a goroutine ID and there is no easy way to get the goroutine ID otherwise. The use of structured stacks likely harms correctness, for reasons @reyang has covered. @tsloughter Here is what I propose. OpenTelemetry should continue to recognize string stacktraces, and tracing/logging SDKs should by default rely on their built-in support for generating stacktraces. Clearly, this doesn't meet every need, and we'd like an alternative. Form a project called OpenStacktrace -- define a format that encompasses a large portion of known stacktrace formats -- specify how to map all the known stacktraces into the new OpenStacktrace format (which shall be |
I'm not suggesting structure in order to reduce any parsing cost, or for any technical reason, but from a user experience standpoint. When there isn't a structure but a reliance on vendors supporting individual string formats the smaller languages get left out :). Even if we include examples and guidance for dealing with the stacktrace strings from Erlang and Elixir it doesn't mean any vendor will add support. I'd still argue that having both, where the structured format is optional, is best, but maybe there is something else we can do to guarantee some level of support for linking out to source code? I mean, it would still be a stack of function calls, just spitballing. Having both would simplify what is expected to be able to be carried in the structured data of the exception and stacktrace -- like Josh mentions Go's goroutine id. Another example, in Erlang there is additional information that can be included with a stack frame. This optional data is passed to a user defined format function to include when the exception and its stacktrace are converted to a string (something only added a version ago, so I am certainly biased to structured stacktraces :) and I wouldn't want to lose from being displayed to the user viewing the stack trace. |
@tsloughter I wonder if there is a way you could adopt an existing stacktrace format like NodeJS or Java has? |
What are you trying to achieve?
A backwards compatible way of having stacktraces in spans and logs be structured data instead of strings. This would allow backends to parse and display usable (like links to the line in a file) information without having to know how to parse a specific languages stacktrace -- and not leave those of us using less popular languages on the outside looking in on these features :)
Additional context.
I'd open this for discussion as a PR but the semantic convention generation tool does not appear to accept a type array-of-maps that is needed for the attribute value. I don't want to spend my time on adding this to the tool if this idea is shot down.
The idea is to add an additional field to the exception semantic convention
exception.structured_stacktrace
with typemap<string, any>[]
and specific keys in the map representing the parts of a stack frame. Example:Other optional keys would also be defined like
column_number
, see the OpenCensus StackFrame proto.This example is the same as the one used currently in the semantic convention for exceptions.
The text was updated successfully, but these errors were encountered: