-
Notifications
You must be signed in to change notification settings - Fork 837
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: [do not merge] OpenTelemetry Datadog Exporter #1316
feat: [do not merge] OpenTelemetry Datadog Exporter #1316
Conversation
…ling, span_kind, operation name mapping
…e sampling rate is accounted for on auto_reject
…idation of trace state keys
…ess Maps and account for buffer max trace and queue size
…s and update underlying behavior to match expectations
…t updates (#9999)
merge master
merge latest changes from master
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not merge
Based on your explanation of your sampler, you may be interested in this open-telemetry/oteps#107 |
Codecov Report
@@ Coverage Diff @@
## master #1316 +/- ##
==========================================
- Coverage 93.16% 93.12% -0.04%
==========================================
Files 139 146 +7
Lines 3921 4336 +415
Branches 804 922 +118
==========================================
+ Hits 3653 4038 +385
- Misses 268 298 +30
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. The only things that stand out is the specific behavior to wait for all spans to be ended to send them; the user should be aware that the exporter will drop spans that should have been traced (specially because the default NoopLogger, it might be hard for user to understand where spans are dropped)
this._traces_spans_started.delete(traceId); | ||
this._traces_spans_finished.delete(traceId); | ||
this._check_traces_queue.delete(traceId); | ||
this._exporter.export(spans, () => {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you might want to log an error here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this will make an export fail silently which is no good
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup makes sense, i've added an error log here. (are there preferences on log level? error feels appropriate here since most debug logs wont get registered, but not sure if there's otel conventions are noiseyness)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exporter for the otel collector logs an error for reference, so i think thats the way to go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ty, will keep as error log.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems ok to me in general. I would worry about the possibility for denial of tracing by an attacker who intentionally makes many requests which don't end, but this is probably minor and there are other ways to combat that.
this._exporter.shutdown(); | ||
} | ||
|
||
// does nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// does nothing. | |
// does something. |
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch, updated to be a bit more specific
this._traces_spans_started.delete(traceId); | ||
this._traces_spans_finished.delete(traceId); | ||
this._check_traces_queue.delete(traceId); | ||
this._exporter.export(spans, () => {}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree this will make an export fail silently which is no good
|
||
return ( | ||
this._traces_spans_started.get(traceId) - | ||
this._traces_spans_finished.get(traceId) <= |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Weird indenting here. Does the linter force this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, there's a few like this, the linter complains otherwise, this is the output of npm run lint:fix
i believe.
} | ||
} | ||
|
||
getTraceContext(span: ReadableSpan): string | undefined[] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
getTraceContext(span: ReadableSpan): string | undefined[] { | |
getTraceContext(span: ReadableSpan): (string | undefined)[] { |
OR
getTraceContext(span: ReadableSpan): string | undefined[] { | |
getTraceContext(span: ReadableSpan): Array<string | undefined> { |
OR
getTraceContext(span: ReadableSpan): string | undefined[] { | |
getTraceContext(span: ReadableSpan): [string, string, string | undefined] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thank you, updated. apologies here I don't use typescript terribly often as you can probably tell, so if there's any ts choices here that you think are questionable i'd certainly be open to updating.
); | ||
|
||
try { | ||
this._exporter.export(formattedDatadogSpans); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is, but it's only appending the trace to the AgentExporter
buffer (which gets flushed on an interval), see: https://github.com/DataDog/dd-trace-js/blob/89e9caa74f643fa31e3aba80f608a0f8857f8b74/packages/dd-trace/src/exporters/agent/index.js#L18
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
first pass
packages/opentelemetry-exporter-datadog/src/datadogPropagator.ts
Outdated
Show resolved
Hide resolved
packages/opentelemetry-exporter-datadog/src/datadogPropagator.ts
Outdated
Show resolved
Hide resolved
The least clear thing is how to populate Datadog's name and resource, because OTel doesn't have both concepts. open-telemetry/opentelemetry-specification#531 When I created my version of the exporter, I split the span names. Like I like this approach too though. |
@dyladan in the otel ruby trace exporter i had some logic for dropping a "random" trace when the queue hits max size, which prevents the sort of denial attack you're referring to and allows the traces that do get exported to represent a better approximation of actual application behavior. Do you think that's worth implementing here, it obviously has some shortfalls but could be better than no traces being exported at all in the case of extremely high load+incomplete traces |
It's up to you if it's worth the extra work for the minimal benefit. I would probably not worry about it unless it becomes a problem. The better solution would be for the agent to not require full traces so you can export spans at any time :) |
@ericmustin dude, you're a community hero! Thanks for putting this up. I've been wanting this now at my organization for a while. Looking forward to getting this merged in within the coming weeks :) |
…plify tag setting logic and span generation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm, thx for all changes, there are just few minor comments yet from my side, but I'm approving, nice job :)
return ddSpanBase; | ||
} | ||
|
||
function addErrors(ddSpanBase: typeof Span, span: ReadableSpan): void { | ||
if (span.status && span.status.code && span.status.code > 0) { | ||
// TODO: set error.msg error.type error.stack based on error events |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: would you mind create an issue in repo so this is not forgotten ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup can do, i'll add an open issue linking to the spec pr open-telemetry/opentelemetry-specification#697 (which got approved it looks like 🎉 )
if (span.status && span.status.code && span.status.code > 0) { | ||
// TODO: set error.msg error.type error.stack based on error events | ||
// Opentelemetry-js has not yet implemented https://github.com/open-telemetry/opentelemetry-specification/pull/697 | ||
// the type and stacktrace are not officially recorded. Until this implemented, | ||
// we can infer a type by using the status code and also the non spec `<library>.error_name` attribute | ||
const possibleType = inferErrorType(span); | ||
ddSpanBase.setTag('error', 1); | ||
ddSpanBase.setTag('error.msg', span.status.message); | ||
ddSpanBase.setTag(DatadogDefaults.ERROR_TAG, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this 1
mean, can this have a different values? maybe add some explanation or soem enum what it really means - I see this in few other places so this might make sense to put here some const default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the error
span tag can be either 1 or 0 if set. 0 is ok, 1 is error. Generally it only gets set in the case of an error. I've moved this into a more descriptive enum with a code comment to clarify.
…trings to enums where possible
@pauldraper yup I hear you, appreciate the feedback, and yes i agree with what you've brought up. For now the approach across ddog exporters in other languages (python, and ruby which is soon to ship, and here) is:
I think we're definitely looking to be iterative here and so if there's feedback (when this starts to get used by folks) around a need for better resource/service/operation name visibility (in the datadog senses of the terms) for specific types of spans, we want to address that. more thoughts about db spans and some general config thoughts
|
… queue, updated snake to camel case where appropriate
@dyladan I went ahead and added some logic for dropping a random trace in the event of very high load + many incomplete traces, lmk what you think, it's relatively straightforward. When appropriate I can close this PR and start to publish this under a datadog repo+ make a pr with that repo to get it listed in the registry. And it goes without saying but appreciate the really thoughtful and timely feedback from you, @obecny and @vmarchaud on this. |
DB queries aren't necessarily low cardinality, though many perf-based tools do rely on that fact. Another issue is that DB queries can be very long (many hundreds or thousands of characters).
I think this is a good approach, and in practice quite useful. |
Is this now OK to close? |
What's the status on this? I don't see any follow up work from it. |
@jon-whit hey there, we've released a beta exporter if you want to give it a shot. https://github.com/DataDog/dd-opentelemetry-exporter-js |
@ericmustin you may want to add it to the registry if you haven't already |
@dyladan ah yup, will do, i have to add a bunch of ruby stuff in there as well |
Edit
DataDog has released an OpenTelemetry exporter which can be found at https://github.com/DataDog/dd-opentelemetry-exporter-js
Which problem is this PR solving?
Any and all feedback is very much appreciated!
Short description of the changes
This PR adds a few components which I'll attempt to summarise below
DatadogSpanProcessor
: An implementation of the SpanProcessor. A current constraint when exporting traces to the Datadog Trace-Agent is that the traces must be 'complete' ie, all started spans in a trace should be sent in one array when they are all finished. The DatadogSpanProcessor is functionally very similar to theBatchSpanProcessor
in that it batches together spans for export, the main implementation difference that it only exports complete traces, and includes both amaxQueueSize
andmaxTraceSize
DatadogExporter
: An exporter implementation that exposesexport
andshutdown
methods. The exporter in practice is very similar to theJaegerExporter
in that it mainly serves to transform OpenTelemetry Spans into Datadog Spans (via the helper methods in./transform.ts
), and then flush those to the Datadog trace-agent. It does so by leveraging the datadog tracer dd-trace-js's internalAgentExporter
, which converts the Spans into the appropriate msgpack format and sends them via http to the trace-agent. Additionally, theDatadogExporter
allows users to set Datadog specific tags such asenv
,service_name
, andversion
. both by Env var and via in-code configurationDatadogPropagator
: A propagator implementation that injects/extracts datadog specific distributed tracing propagation headers. This is similar to theB3Propagator
in that it must handle 128bit => 64bit trace-id conversion.DatadogProbabilitySampler
: A sampler implementation that records all spans but only samples span according to the sampling rate. This is similar to theProbabilitySampler
with the main difference being that all spans are records (but not sampled). The underlying reasons for this is that Datadog's Trace-Agent generates some trace-related metrics such as hits and errors, and in order to properly upscale sampled traces into accurate hit counts, it needs to be aware of all traces/spans. Since the ReadableSpan doesn't expose the sampling rate of the trace, the only way to determine this information is to export all spans and traces (with traces that should be dropped due to sampling having their datadog specific sampling rate metric tag adjusted accordingly but still exported to the trace agent)Example Usage
Example Output