Structured Stacktrace in Span Attributes/Logs #2839

tsloughter · 2022-09-26T22:18:52Z

What are you trying to achieve?

A backwards compatible way of having stacktraces in spans and logs be structured data instead of strings. This would allow backends to parse and display usable (like links to the line in a file) information without having to know how to parse a specific languages stacktrace -- and not leave those of us using less popular languages on the outside looking in on these features :)

Additional context.

I'd open this for discussion as a PR but the semantic convention generation tool does not appear to accept a type array-of-maps that is needed for the attribute value. I don't want to spend my time on adding this to the tool if this idea is shot down.

The idea is to add an additional field to the exception semantic convention exception.structured_stacktrace with type map<string, any>[] and specific keys in the map representing the parts of a stack frame. Example:

[{"function_name": "com.example.GenerateTrace.methodB", 
  "filename": "GenerateTrace.java", 
  "line_number": 13}, 
 {"function_name": "com.example.GenerateTrace.methodA", 
  "filename": "GenerateTrace.java", 
  "line_number": 9}, 
 {"function_name": "com.example.GenerateTrace.main", 
  "filename": "GenerateTrace.java", 
  "line_number": 5}]'

Other optional keys would also be defined like column_number, see the OpenCensus StackFrame proto.

This example is the same as the one used currently in the semantic convention for exceptions.

The text was updated successfully, but these errors were encountered:

tsloughter · 2022-09-26T23:18:22Z

I forgot that trace and span attribute values can only be primitive values and arrays of primitive values, so not arrays of maps: https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/common#attribute

So closing this. I really would like to see this incorporated somehow, pls comment on this issue and I can reopen it if there are viable alternative suggestions.

Oberon00 · 2022-09-27T11:30:27Z

I'm reopening this because (1) it's completely feasible to record structured information with the current attributes. Just convert from an "array of structs" to a "struct of arrays", i.e. instead of having one array with objects that have three properties each, have three arrays stack.functions, stack,linenos, stack.filenames. And (2) I think this would without doubt be a useful feature (and having unresolved dependencies for the issue would not a reason to close it).

tsloughter · 2022-09-27T11:55:05Z

Hm, ok, so something like:

{
"exception.structured_stacktrace.function_names": ["com.example.GenerateTrace.methodB", "com.example.GenerateTrace.methodA", "com.example.GenerateTrace.main"],
"exception.structured_stacktrace.filenames":  [""GenerateTrace.java", "GenerateTrace.java", "GenerateTrace.java"],
"exception.structured_stacktrace.line_numbers": [13, 9, 5]
}

I agree structured stacktraces are too useful to not continue looking into how to support them but thought it wouldn't be feasible in attributes, which this issue was about, but would need a message type.

Your suggestion definitely changes that.

Now that it is also a form I can easily make a PR for because I think the tools support defining a type of array of strings I can go do that as well.

reyang · 2022-09-30T15:40:35Z

I personally think we should just use a plain string and provide some example/suggestion/guidance on the format so people know how to parse the common patterns. I understand that having more structure would reduce parsing cost, I'll explain the reason below:

Here is my experience (I'm the owner of several Microsoft internal stacktrace schemas) - trying to define a cross language/runtime structured stacktrace is a very hard problem, we'll always find cases that don't fit (e.g. what about Java calling into C++ via JNI and going back via reverse-JNI, do we want to distinguish kernel frames and user frames, do we want to show user friendly frames or the real frames which might have thunks, do we use fully-qualified function/type name or short names, do we have path+filename or fullpath and whether it is a local path or the repo path, what do we do with AggregatedExceptions or Exception which has inner exceptions).

From what I've seen in the past 15+ years, all the efforts trying to define highly-structured cross-language callstack ended up with folks trying to fit new scenarios into the established structured in very weird ways, and they will always have to do extra parsing in the backend.

jmacd · 2022-09-30T16:02:47Z

I second @reyang's opinion, speaking as the former maintainer of a multi-language stacktrace parsing library (part of Google's offering in this space)

The first time we discussed adding a stacktrace semantic convention to OTel was over two years ago:

open-telemetry/oteps#69

We already have a standard data type for stacktraces. While programmers tend to see the advantage of structuring these at runtime, they also have a bias toward action and a limited perspective. Clearly, their ability to debug a situation would improve if they would just take advantage of runtime libraries to self-introspect the stack. However, this comes at the cost of performance, safety, and likely correctness. The language runtime's ability to generate a stacktrace string is guaranteed to exceed the program's ability to do so using structured APIs.

The use of structured stacks is costly because you will generate many small objects.

The use of structured stacks is unsafe. For example, in the JDK PrintStacktrace() method there is a special anti-recursion feature built in, the comment reads:

    private void printStackTrace(PrintStreamOrWriter s) {
        // Guard against malicious overrides of Throwable.equals by
        // using a Set with identity equality semantics.

The runtime library likely has special advantages in computing these, for example the Golang runtime stacktrace includes a goroutine ID and there is no easy way to get the goroutine ID otherwise.

The use of structured stacks likely harms correctness, for reasons @reyang has covered.

@tsloughter Here is what I propose.

OpenTelemetry should continue to recognize string stacktraces, and tracing/logging SDKs should by default rely on their built-in support for generating stacktraces. Clearly, this doesn't meet every need, and we'd like an alternative. Form a project called OpenStacktrace -- define a format that encompasses a large portion of known stacktrace formats -- specify how to map all the known stacktraces into the new OpenStacktrace format (which shall be string) -- implement high-performance, safe, correct methods to compute OpenStacktrace strings at runtime -- then give OpenTelemetry SDKs the default option (builtin support) or the new option (OpenStacktrace). When the new option becomes appealing to enough users, perhaps we can switch, but the point of all this is that users will likely be better off using whatever is built-in. If OpenStacktrace succeeds, languages can adopt it, and ultimately the vendors and observability systems that ingest this data will always have the option to parse the stacktrace whether it's old or new.

tsloughter · 2022-09-30T16:52:22Z

I'm not suggesting structure in order to reduce any parsing cost, or for any technical reason, but from a user experience standpoint. When there isn't a structure but a reliance on vendors supporting individual string formats the smaller languages get left out :). Even if we include examples and guidance for dealing with the stacktrace strings from Erlang and Elixir it doesn't mean any vendor will add support.

I'd still argue that having both, where the structured format is optional, is best, but maybe there is something else we can do to guarantee some level of support for linking out to source code? I mean, it would still be a stack of function calls, just spitballing.

Having both would simplify what is expected to be able to be carried in the structured data of the exception and stacktrace -- like Josh mentions Go's goroutine id. Another example, in Erlang there is additional information that can be included with a stack frame. This optional data is passed to a user defined format function to include when the exception and its stacktrace are converted to a string (something only added a version ago, so I am certainly biased to structured stacktraces :) and I wouldn't want to lose from being displayed to the user viewing the stack trace.

jmacd · 2022-09-30T17:15:46Z

@tsloughter I wonder if there is a way you could adopt an existing stacktrace format like NodeJS or Java has?

tsloughter added the spec:miscellaneous For issues that don't match any other spec label label Sep 26, 2022

github-actions bot assigned yurishkuro Sep 26, 2022

tsloughter closed this as completed Sep 26, 2022

Oberon00 reopened this Sep 27, 2022

tsloughter mentioned this issue Sep 27, 2022

semantic conventions: add structured stacktrace to exception attributes #2841

Closed

rbailey7210 added the [label deprecated] triaged-needmoreinfo [label deprecated] The issue is triaged - the OTel community needs more information to decide label Sep 30, 2022

cijothomas mentioned this issue Dec 20, 2024

[GenevaExporter] Add ability to export Log Exception using StackTrace open-telemetry/opentelemetry-dotnet-contrib#2422

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Structured Stacktrace in Span Attributes/Logs #2839

Structured Stacktrace in Span Attributes/Logs #2839

tsloughter commented Sep 26, 2022

tsloughter commented Sep 26, 2022

Oberon00 commented Sep 27, 2022 •

edited

Loading

tsloughter commented Sep 27, 2022

reyang commented Sep 30, 2022 •

edited

Loading

jmacd commented Sep 30, 2022

tsloughter commented Sep 30, 2022

jmacd commented Sep 30, 2022

Structured Stacktrace in Span Attributes/Logs #2839

Structured Stacktrace in Span Attributes/Logs #2839

Comments

tsloughter commented Sep 26, 2022

tsloughter commented Sep 26, 2022

Oberon00 commented Sep 27, 2022 • edited Loading

tsloughter commented Sep 27, 2022

reyang commented Sep 30, 2022 • edited Loading

jmacd commented Sep 30, 2022

tsloughter commented Sep 30, 2022

jmacd commented Sep 30, 2022

Oberon00 commented Sep 27, 2022 •

edited

Loading

reyang commented Sep 30, 2022 •

edited

Loading