Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hash_Seed Probabilistic Sampler Behaviour Broken #35140

Open
navinpai opened this issue Sep 11, 2024 · 5 comments
Open

Hash_Seed Probabilistic Sampler Behaviour Broken #35140

navinpai opened this issue Sep 11, 2024 · 5 comments
Assignees
Labels
bug Something isn't working processor/probabilisticsampler Probabilistic Sampler processor Stale

Comments

@navinpai
Copy link

navinpai commented Sep 11, 2024

Component(s)

processor/probabilisticsampler

What happened?

Description

The probabilistic sampler seems to have a breaking change starting with v0.103.0, most likely due to some change in this PR: 9f0a3db.

Steps to Reproduce

Tried the same config on otel-collector-contrib v0.102.0 and v0.103.0

config:

    processors:
      probabilistic_sampler:
        hash_seed: 22
        sampling_percentage: 100
    exporters:
      zipkin:
        endpoint: "https://<endpoint>/api/v2/spans"
      .
      .
      .
      .
   processors: [memory_limiter, probabilistic_sampler, batch]
   exporters: [debug, zipkin]

and the following payload:

{
  "resourceSpans": [
    {
      "resource": {
        "attributes": [
          {
            "key": "service.name",
            "value": {
              "stringValue": "test-dev"
            }
          }
        ]
      },
      "scopeSpans": [
        {
          "scope": {
            "name": "my.library",
            "version": "1.0.0",
            "attributes": [
              {
                "key": "my.scope.attribute",
                "value": {
                  "stringValue": "Test"
                }
              }
            ]
          },
          "spans": [
            {
              "traceId": "E0FC16701EBF47A29684ACB3FC755798",
              "spanId": "08F5F1DEB8CF4D98",
              "parentSpanId": "",
              "name": "Test Span",
              "startTimeUnixNano": "1726046662083988472",
              "endTimeUnixNano": "1726046697083988472",
              "kind": 2,
              "attributes": [
                {
                  "key": "my.span.attr",
                  "value": {
                    "stringValue": "Test"
                  }
                }
              ]
            }
          ]
        }
      ]
    }
  ]
}

With different TraceId and SpanId values

Expected Result

Since config is same, and sampling_percentage is 100 all traces should make their way to jaeger.

Actual Result

Only traces sent to v0.102.0 make it to jaeger. Any traces sent to v0.103.0do not seem to make it to jaeger.

Note: Since I also have a debug exporter, I see in both cases, the debug is printed correctly info TracesExporter {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1} yet still in 0.102.0, the traces are missing in jaeger

Screenshot 2024-09-11 at 4 56 34 PM

If it matter, the jaeger version is 1.57.0

Collector version

v0.103.0

Environment information

Environment

K8s 1.30

OpenTelemetry Collector configuration

processors:
      probabilistic_sampler:
        hash_seed: 22
        sampling_percentage: 100
    exporters:
      zipkin:
        endpoint: "https://<endpoint>/api/v2/spans"
      .
      .
      .
      .
   processors: [memory_limiter, probabilistic_sampler, batch]
   exporters: [debug, zipkin]

Log output

info    TracesExporter  {"kind": "exporter", "data_type": "traces", "name": "debug", "resource spans": 1, "spans": 1}

Additional context

Workaround

A straightforward workaround is to delete the hash_seed key in the config, and then traces come in from v0.103.0 as well, but there is no documentation about this requirement to remove it. (also, this is the documented configuration for probabilitic sampler till not too long back, so I'm sure more people will face the same issue) .

@navinpai navinpai added bug Something isn't working needs triage New item requiring triage labels Sep 11, 2024
@github-actions github-actions bot added the processor/probabilisticsampler Probabilistic Sampler processor label Sep 11, 2024
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@navinpai navinpai changed the title Hash_Seed Probabilitic Sampler Behaviour Broken Hash_Seed Probabilistic Sampler Behaviour Broken Sep 12, 2024
@jpkrohling
Copy link
Member

cc @jmacd

@jmacd jmacd self-assigned this Sep 17, 2024
@jmacd
Copy link
Contributor

jmacd commented Sep 17, 2024

@navinpai I can help get to the bottom of this. There are a few clues in the text above, and I agree that something is wrong.
IIUC, the debug exporter shows that a span is passing through the sampler but does not appear to Jaeger. This suggests something in the data may be causing Jaeger to reject the data?

Would you be able to add verbose detail to the debug export, so we can see the span that is passing through?

exporters:
  debug:
    verbosity: detailed

I expect to see only one difference between the outputs from v0.102 and v0.103: the Span.TraceState field should be set with a value like ot=th:0;rv:abcdefabcdefab where th:0 signifies 100% sampling and rv:... is an explicit randomness value. I wonder if Jaeger logs any errors when it receives this data?

The other clue about omitting hash_seed: 22 leaves me confused. The change in 9f0a3db has a conditional to set the new mode based on whether HashSeed is set or not, but a reviewer flagged a potential change of behavior--the code looks like:

	mode := cfg.Mode
	if mode == modeUnset {
		// Reasons to choose the legacy behavior include:
		// (a) having set the hash seed
		// (b) logs signal w/o trace ID source
		if cfg.HashSeed != 0 || (isLogs && cfg.AttributeSource != traceIDAttributeSource) {
			mode = HashSeed
		} else {
			mode = defaultMode
		}
	}

Note that defaultMode is HashSeed, so despite that conditional the result of an unset mode is to use HashSeed, with or without an explicit HashSeed value. This tells me that adding and removing hash_seed: 22 ought to make no difference.

@atoulme atoulme removed the needs triage New item requiring triage label Oct 2, 2024
@jmacd
Copy link
Contributor

jmacd commented Oct 7, 2024

@navinpai Can you take a look at the Jaeger logs to see if it is rejecting the new TraceState values that a v0.103.0 processor is adding? If I understand your reporting, the trace makes it through the sampler in a v0.103 configuration, so it could be that the appearance of a TraceState field is giving Jaeger trouble.

Copy link
Contributor

github-actions bot commented Dec 9, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Dec 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working processor/probabilisticsampler Probabilistic Sampler processor Stale
Projects
None yet
Development

No branches or pull requests

4 participants