Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid attributes on Spans, Events, Links, and Resources #203

Closed
mwear opened this issue Feb 29, 2020 · 11 comments · Fixed by #521
Closed

Invalid attributes on Spans, Events, Links, and Resources #203

mwear opened this issue Feb 29, 2020 · 11 comments · Fixed by #521
Labels
help wanted Extra attention is needed

Comments

@mwear
Copy link
Member

mwear commented Feb 29, 2020

We've discussed this before and temporarily went with the idea of a strict mode, that was never finished and the Exception handing spec discourages an OTel implementation from raising unhandled exceptions. See: #14 and #126.

We are in the situation where invalid attributes on spans, links, events and resources can make it into the export pipeline where currently it breaks things. In order to free the export pipeline from having to validate all data, we should ensure that attributes that make to the pipeline are valid.

I'm proposing the following strategy for attribute validation:

  • When possible, validate attributes only on telemetry that is going to be exported
  • Avoid excessive runtime checks on data that won't be exported
  • Valid attributes should be kept, while invalid attributes should be dropped and a message can optionally be logged.
  • Invalid attributes have the potential to generate lots of logging data, so we should be conservative in logging, if we go that route
@mwear mwear added the help wanted Extra attention is needed label Feb 29, 2020
@mwear mwear added this to the Alpha v0.3 milestone Feb 29, 2020
@mwear mwear changed the title Invalid attributes on Span, Events, Links, and Resources Invalid attributes on Spans, Events, Links, and Resources Feb 29, 2020
@fbogsany
Copy link
Contributor

fbogsany commented Mar 2, 2020

The general approach here sounds good to me.

@fhwang
Copy link
Contributor

fhwang commented Mar 5, 2020

Assign this to me plz :)

@fhwang
Copy link
Contributor

fhwang commented Mar 10, 2020

Any thoughts on where is the right place in the SDK to insert this validation logic? I was thinking it could be done right before calling the exporter's export method, but that seems to be referenced from multiple places in the SDK. I could put the checks in multiple places (and/or refactor to de-dupe), but I just want to check beforehand to make sure I'm not missing anything obvious.

[opentelemetry-ruby (validate-attrs-on-export*)]$ gg "\.export"|grep -wv test|grep -v "io.opentelemetry.sdk.trace.export.MultiSpanExporter"
sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:            Timeout.timeout(@exporter_timeout_seconds) { @exporter.export(batch) }
sdk/lib/opentelemetry/sdk/trace/export/batch_span_processor.rb:              result_code = @exporter.export(batch)
sdk/lib/opentelemetry/sdk/trace/export/multi_span_exporter.rb:                merge_result_code(result_code, span_exporter.export(spans))
sdk/lib/opentelemetry/sdk/trace/export/simple_span_processor.rb:            @span_exporter&.export([span.to_span_data])

@fbogsany
Copy link
Contributor

A couple of options:

  1. In OpenTelemetry::SDK::Trace::Span#to_span_data. This is called from a SpanProcessor right before passing the data to an exporter. The downside of this is that processors may have to handle invalid data, but data will be valid for exporters (and in general, we expect more exporters to be plugged in than processors).
  2. In a SpanProcessor. This can then be installed or removed as part of the export pipeline, and is relatively self-contained. This would likely be installed as the first child of a MultiSpanProcessor. The default pipeline might be something like: MultiSpanProcessor { ValidatingProcessor, BatchSpanProcessor { exporter = ConsoleSpanExporter } }.

@mwear
Copy link
Member Author

mwear commented Mar 10, 2020

Since SDK spans should only be created if they're going to be sampled, would it be reasonable to eagerly drop invalid attributes on span creation and also drop in Span#set_attribute?

I think we could probably do the same on Event and Link creation. Since they are immutable it requires making a copy to drop invalid data, which is somewhat cumbersome. It's harder to ensure that they will be sampled before doing the work, but it might be ok to eagerly drop attributes on initialization?

The upside to this approach is that processors will only see valid data, the slight downside, is that events and links might processed, but not sampled.

@fhwang
Copy link
Contributor

fhwang commented Mar 11, 2020

Cool, I'm exploring this option. A few other questions:

  1. SDK::Span#initialize and SDK::Span#trim_span_attributes do not currently test for validity of attribute keys. Can I assume that's just an oversight and fix that? (I'm thinking of centralizing some of SDK::Span's attribute processing into a single convenience method.)
  2. What's the nature of Resource attribute validation I should be looking at? I'm not sure where this is described in the spec.

@fbogsany
Copy link
Contributor

The downside of any eager validation is that more work is required even if attributes are dropped for other reasons (e.g. max links, events, or attributes is exceeded).

Everything span-attribute-related should go through SDK::Span#trim_span_attributes, so if we eagerly validate, it should be done there. The fact it isn't is certainly an oversight.

@fhwang
Copy link
Contributor

fhwang commented Mar 11, 2020

Would it be safe to defer all attribute validation (key correctness, value correctness, attributes hash size) until SDK::Span#finish is called? Is it reasonable to assume that no exporter will ever send data before that call?

@fhwang
Copy link
Contributor

fhwang commented Mar 11, 2020

Actually disregard that, I suppose #to_span_data is a better place for it.

@fbogsany
Copy link
Contributor

SDK::Span#finish is the last point to do it before SpanProcessors see the data, so if we want valid data in the processors, it has to be done by then. #to_span_data is the last point to do it before the exporters see the data.

The most flexible approach is to perform validation in an optional (but enabled by default) processor, but that implies that processors earlier in the pipeline may see invalid data. There are scenarios where that could be just fine (for example, processors that rewrite or drop spans).

The main reason to eagerly limit attribute hash size is to limit the heap size kept live by data in the pipeline.

@fbogsany
Copy link
Contributor

fbogsany commented Apr 3, 2020

Bumped this to Alpha 0.4 so we can proceed with the 0.3 release (as discussed on the SIG call).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants