Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NH-26751 NH-25139 Adjust propagator and sampler responsibilities #175

Merged
merged 20 commits into from
Jul 19, 2023

Conversation

tammy-baylis-swi
Copy link
Contributor

@tammy-baylis-swi tammy-baylis-swi commented Jul 13, 2023

XTraceOptions is now created by propagator.extract, or at sampler.should_sample but only as a fallback if none is in the current Otel context. All xto info is in the XTraceOptions struct under INTL_SWO_X_OPTIONS_KEY key in Otel context, so we no longer store under a separate INTL_SWO_SIGNATURE_KEY.

The XTraceOptions struct has been simplified so that it takes 2 optional headers (str) from incoming request's x-trace-options and x-trace-options-signature. It also only stores the original x-trace-options header if signature is provided. This was Raph's nice suggestion to save on space because options_header only comes in play for a signed TT request.

The sampler's helper calculate_trace_state is much simpler now! All is does is add xto info for piggyback into any xto response if needed. sw KV is no longer added to a (future) span's tracestate by the sampler.should_sample, while sw KV continues to be extracted and injected by propagator. This means:

  1. sw can be in the tracestate of a completed span if it was extracted (propagator.extract) from a request with incoming, valid span context (scenario 4, 8) before should_sample is called.
  2. sw will not be in tracestate of a completed span if there is no valid span context (scenario 1) -- so no more zeroed out span_id being used when not necessary.

Most of the changes here are unit/integration test updates which now all pass. I also did some manual/e2e testing with headers and traces described here (though not every one of our w3c context acceptance criteria): https://swicloud.atlassian.net/wiki/spaces/NIT/pages/3687877326/2023-07-13+Refactor+x-trace-options+usage

Please let me know if any questions, or if more e2e tests an/or a demo/walkthrough would help!

@tammy-baylis-swi tammy-baylis-swi changed the title NH-26751 Refactor propagator and sampler NH-26751 Adjust propagator and sampler responsibilities Jul 17, 2023
@@ -17,7 +17,6 @@
INTL_SWO_TRACESTATE_KEY = "sw"
INTL_SWO_X_OPTIONS_KEY = "sw_xtraceoptions"
INTL_SWO_X_OPTIONS_RESPONSE_KEY = "xtrace_options_response"
INTL_SWO_SIGNATURE_KEY = "sw_signature"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All xto info is in the XTraceOptions struct under INTL_SWO_X_OPTIONS_KEY key in Otel context, so we no longer store under a separate INTL_SWO_SIGNATURE_KEY.

Comment on lines +44 to +46
# store original header for c-lib sample decision later
# only if signature_header exists
self.options_header = xtraceoptions_header
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was Raph's nice suggestion to save on space because options_header only comes in play for a signed TT request.

@@ -316,79 +317,31 @@ def create_xtraceoptions_response_value(

return ";".join(response)

def create_new_trace_state(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

calculate_trace_state is so much simpler that I figured we could compress and get rid of its helper create_new_trace_state

trace_state = TraceState(
[
(
INTL_SWO_TRACESTATE_KEY,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The sampler no longer sets INTL_SWO_TRACESTATE_KEY (sw) in tracestate internally. We rely on the propagators to inject it into outgoing requests. In the case of valid remote parents, it's the propagators' extract that bring it to should_sample.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me so happy

@tammy-baylis-swi tammy-baylis-swi changed the title NH-26751 Adjust propagator and sampler responsibilities NH-26751 NH-25139 Adjust propagator and sampler responsibilities Jul 18, 2023
@tammy-baylis-swi tammy-baylis-swi marked this pull request as ready for review July 18, 2023 01:10
@tammy-baylis-swi tammy-baylis-swi requested a review from a team July 18, 2023 01:10
xtraceoptions_header[0],
)

) or [""]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kind of a shame that the Getter doesn't provide a fallback

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed!!! 🙃

"""Calculates trace_state based on x-trace-options if provided -- for non-existent or remote parent spans only."""
# No valid parent i.e. root span, or parent is remote
if (
not parent_span_context.is_valid
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Elsewhere I've seen you check whether parent_span_context is None. Is that not needed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah!!! Added in 0f3c67f

@@ -906,7 +910,7 @@ def test_signed_missing_xtraceoptions_header(self):

# Verify x-trace-options-response response header not present
# because no x-trace-options-header
assert "x-trace-options-response" not in resp.headers
# assert "x-trace-options-response" not in resp.headers # !!!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No longer needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually it's needed, and it's failing! Gonna dig

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Re-added the test in c5e0858 and it now passes. I think this is a requirement based on this page:

If an X-Trace-Options header was given, the agent will set a response header X-Trace-Options-Response to communicate status including the reason a trigger trace request was not traced and any ignored options (see request/response table below).

In the same commit I've added include_response to the XTraceOptions struct to make the check before calculating response header X-Trace-Options-Response, plus more unit tests.

@raphael-theriault-swi
Copy link
Member

Looking at this too

Copy link
Member

@raphael-theriault-swi raphael-theriault-swi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, happy to see this ! I'll led Jared finish his review

trace_state = TraceState(
[
(
INTL_SWO_TRACESTATE_KEY,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes me so happy

Copy link
Contributor

@swi-jared swi-jared left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, with a minor nit

@@ -541,7 +495,13 @@ def should_sample(
parent_span_context = get_current_span(
parent_context
).get_span_context()
xtraceoptions = XTraceOptions(parent_context)

if not parent_context or not parent_context.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An optimization for this block might be:

xtraceoptions = XTraceOptions()
if parent_context and parent_context.is_valid():
    xtraceoptions = parent_context.get(INTL_SWO_OPTIONS_KEY) or xtraceoptions

This would remove the first parent_context.get call, which might not be hugely expensive on its own, but in aggregate the sampler might hit it quite a lot.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Otel Python there isn't a Context.is_valid() function so we can't do parent_context.is_valid(). There is a SpanContext.is_valid() so we could do parent_span_context.is_valid() but I don't think this check does the same thing and the adjusted condition makes several of the integration/ tests fail.

When the Otel SDK calls should_sample, the Otel Context (parent_context) could be empty (falsy) or could contain KVs, e.g. what's set by extraction like a XTraceOptions struct, or what's at the _SPAN_KEY (used to get_current_span and get_span_context at L495-497)

If Otel SpanContext (parent_span_context) is valid or not valid, there could still be KVs in the Otel Context.

@tammy-baylis-swi tammy-baylis-swi merged commit 19d479f into main Jul 19, 2023
@tammy-baylis-swi tammy-baylis-swi deleted the NH-26751-refactor branch July 19, 2023 16:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants