Skip to content
This repository has been archived by the owner on Dec 6, 2024. It is now read-only.

Converting Traces and Metrics into logs. #120

Closed
wants to merge 11 commits into from
193 changes: 193 additions & 0 deletions text/0120-metric-trace-log-conversion.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,193 @@
# Metric and Trace log conversion

Specifying the structure of trace and metric data in logs.

## Motivation

There are multiple requests and issues related to converting metric/tracing data into logs:

* [Issue 398](https://github.com/open-telemetry/opentelemetry-specification/issues/398)
* [Issue 617](https://github.com/open-telemetry/opentelemetry-specification/issues/617)

The [aim here](https://gitter.im/open-telemetry/logs?at=5ee284f2ef5c1c28f0194a89) is to create a standard method to convert a trace or metric into a log for Otel exporters to reduce confusion and increase compatibility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the intent to create a human readable representation of metrics and traces in the logs or a machine readable one? Depending on what the goal is there are likely different choices to be made. It is not clear from this document which one is the goal (or maybe the goal is something else). It would be useful to call this out.

FYI, there is a human-readable version of metric and traces in the logs in the Collector: https://github.com/open-telemetry/opentelemetry-collector/blob/master/exporter/loggingexporter/logging_exporter.go

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. After some other feedback I got, I'll make the intention more clear, and also shift the goals a bit. The updates will include multiple export formats depending on intent and purpose, instead of only having a single way of doing things. Also, thank you for making me aware of that prior work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@avik-so do you intend to update this OTEP?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do.
Still doing research on my part. I'm happy to get assistance.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for Metrics, see open-telemetry/opentelemetry-proto#181.

This PR amounts to a (rejected) proposal for OTLP. The problem with open-telemetry/opentelemetry-proto#181 is that it was better at describing the input of an aggregation, not the output of an aggregation.

I think if we're logging, the idea is to log raw metric events (i.e., no aggregation). The proposal in that PR has a MetricDescriptor with these fields:

  • Temporality (Cumulative, Delta)
  • Structure (Adding Monotonic, Adding, Grouping)
  • Continuity (Continuous, Snapshot)
    = 12 combinations

These 12 combinations can be thought of as six instruments crossed with cumulative/delta, which completely and accurately describes the current OTel API metric event.

That led to very lengthy explanations of what it means for various kinds aggregation. If we're only considering raw metric events, then this may be a good protocol.

Technical notes

The problem with open-telemetry/opentelemetry-proto#181 may have been predicted by #88 and #93, where more than six potential kinds of instrument were described and then some of them were rejected. See also open-telemetry/opentelemetry-proto#199 believe that a metric descriptor with Structure=Grouping and Temporality=Delta could also be described as the hypothetical "Interval" temporality mentioned there.

There is also a question of whether you can derive system.cpu.utilization or process.cpu.utilization correctly with whatever logging format is proposed: open-telemetry/opentelemetry-specification#819


## Explanation

Sometimes, a system needs to log tracing or metric data into an easy to parse log format. In those cases, the data should be logged using the following conversion tables.

|Open Telemetry Metric field | Open Telemetry Log field|
|--- |--- |
| time of metric collection | log.Timestamp |
| correlation Context | log.TraceId |
| span Context | log.SpanId |
| sampled | log.TraceFlags |
| TRACE3 | log.SeverityText |
| 3 | log.SeverityNumber |
| metric.label | log.Name |
| metric.value | log.Body.value |
| metric.resources | log.Resource |
| metric.definition | log.Attributes |
| metric.meterIdentifier | log.Attributes.meterIdentifier |
| metric.AggregationType | log.Attributes.AggregationType |
| metric.Instrumentation | log.Attributes.Instrumentation |

|Open Telemetry Trace field| Open Telemetry Log field|
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably better to refer to "Span field" instead of "Trace field" since the table lists Span fields.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since Trace is a high level concept and span isn't, can you point out the differences to me? I obviously missed something here.

|--- |--- |
| trace.start.timestamp | log.Timestamp |
| trace.TraceId | log.TraceId |
| trace.SpanId | log.SpanId |
| trace.sampled | log.TraceFlags |
| TRACE3 | log.SeverityText |
| 3 | log.SeverityNumber |
| trace.name | log.Name |
| trace.start.timestamp | log.Body.StartTimeStamp |
| trace.end.timestamp | log.Body.EndTimeStamp |
| trace.parentId | log.Body.ParentId |
| trace semantic conventions** | log.Resource |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which semantic conventions? There are conventions for Resources and conventions for Spans. Does this suggest to put Span attributes that are defined in the semantic conventions into the Resource? If so then it is not clear why.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent here was the semantic conventions for resources, I missed the fact that they also exist for span attributes.

| trace.attributes | log.Attributes |
| trace.Events | log.Attributes.Events |
| trace.Tracestate | log.Attributes.Tracestate |
| trace.status | log.Attributes.status |
| trace.kind | log.Attributes.kind |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we do with the Span Resource? It is missing in this table.


** Semantic conventions for Traces can be mapped directly to log.Resource.convention For example `db.type` would be converted to `{Resource: {db.type: value} }`

Examples:

### Open Metric

Based on a sample metric from [Alan Storm](https://alanstorm.com/what-are-open-telemetry-metrics-and-exporters/)
The following Metric

```

{
descriptor: {
name: 'my-open-telemetry-counter',
description: 'A simple counter',
unit: '1',
metricKind: 0,
valueType: 1,
labelKeys: [],
monotonic: true
},
labels: {},
aggregator: CounterSumAggregator {
_current: 1,
_lastUpdateTime: [ 1589046826, 890210944 ]
}
}

```

Would be logged as:

```

{
"timestamp": 1589046070557,
"traceid": <sometraceid>,
"spanid": <somespanid>,
"traceflags": {},
"severityText": "TRACE3",
"severityNumber": 3,
"name": "my-open-telemetry-counter",
"body": {
"value": 1,
},
"attributes": {
"aggregagator": "CounterSumAggregator",
"meterIdentifier": "my-meter",
"description": "A simple counter",
"unit": '1',
"metricKind": 0,
"valueType": 1,
"labelKeys": [],
"monotonic": true
},
"resource": {}
}

```

### Open Trace

The following [sample trace](https://opentelemetry-python.readthedocs.io/en/stable/getting-started.html):

```

{
"name": "baz",
"context": {
"trace_id": "0xb51058883c02f880111c959f3aa786a2",
"span_id": "0xb2fa4c39f5f35e13",
"trace_state": "{}"
},
"kind": "SpanKind.INTERNAL",
"parent_id": "0x77e577e6a8813bf4",
"start_time": "2020-05-07T14:39:52.906272Z",
"end_time": "2020-05-07T14:39:52.906343Z",
"status": {
"canonical_code": "OK"
},
"attributes": {},
"events": [],
"links": []
}

```

Would be logged as:

```

{
"timestamp": 1588862393906, // 2020-05-07T14:39:52.906272Z as a timestamp
"traceid": "0xb51058883c02f880111c959f3aa786a2",
"spanid": "0xb2fa4c39f5f35e13",
"traceflags": { },
"severityText": "TRACE3",
"severityNumber": 3,
"name": "baz",
"body": {
"startTimestamp": "2020-05-07T14:39:52.906272Z",
"endTimeStamp": "2020-05-07T14:39:52.906343Z",
"parentId": "0x77e577e6a8813bf4",
},
"attributes": {
"kind": "SpanKind.INTERNAL",
"status": {"canonical_code: "OK"},
"attribues": {},
"traceState": {},
"events": {},
"links": {}
},
"resource": {}
}

```

## Internal details

Every SDK or API would implement this conversion themselves. This is merely a standard mapping to doing that conversion.

## Trade-offs and mitigations

One drawback is the limited scope of this OTEP in not handling the actual conversion of these fields. This could be mitigated by creating a Metric or Trace conversion library once the Log API/SDK is defined.

## Prior art and alternatives

* It's possible to have the metric definition, value, and label all be inserted into the log attribute or body, however, leaving the body empty except for the metric value will provide better aggregation capabilities.

* The [LogCorrelation](https://github.com/census-instrumentation/opencensus-specs/blob/master/trace/LogCorrelation.md#string-format-for-tracing-data) document in Trace has some advice for converting traces to logs. However, since the Log data model supports the TraceFlags as a bit, it's advice to turn sampling data to "true" or "false" strings, is ignored here.

* Another suggestion from Issue 398 is to have the logs look like a sys log with some added key value pairs. This sort of output is outside the scope of this OTEP though the log data structure can easily be parsed and printed into this format. For example: `17:05:43 INFO {sampled=true, spanId=ce751d1ad8be9d11, traceId=ce751d1ad8be9d11} [or.ac.qu.GreetingResource] (executor-thread-1) hello`

## Open questions

Is this mapping enough? Are others needed?

## Future possibilities

Once this Otep is accepted, Otel exporters can produce standarized logs for all metrics and traces increasing compatibility between Otel and reducing confusion.
We can also create further mappings for well known tracing or metric formats from other systems.