Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(opentelemetry-instrumentation-celery): attach incoming context on… #2385

Conversation

malcolmrebughini
Copy link
Contributor

… _trace_prerun

Description

Currently incoming baggage does not get picked up. So this PR applies the fix described in the original issue.

Fixes #784

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

The original issue has steps to reproduce.

  • Test A

Does This PR Require a Core Repo Change?

  • Yes. - Link to PR:
  • No.

Checklist:

See contributing.md for styleguide, changelog guidelines, and more.

  • Followed the style guidelines of this project
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Copy link

linux-foundation-easycla bot commented Mar 31, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

malcolmrebughini added a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Mar 31, 2024
@malcolmrebughini malcolmrebughini requested a review from a team March 31, 2024 16:40
Copy link
Contributor

@xrmx xrmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a newbie I find it a bit strange to expect to use the baggage to pass metadata implicitly to the celery worker. If the span has the baggage context I don't see it as a bug.

@malcolmrebughini
Copy link
Contributor Author

malcolmrebughini commented Apr 2, 2024

@xrmx Other implementations like the django one, make use of this function that I believe it implies that if there's no active span it should extract the context from the carrier, attach and then creates a span.

As another reference, the kafka client does the context.attach after creating the span here.

Unless I'm misunderstanding your concerns and there is a better way to achieve this.

@malcolmrebughini malcolmrebughini marked this pull request as draft April 3, 2024 04:38
@xrmx
Copy link
Contributor

xrmx commented Apr 3, 2024

@xrmx Other implementations like the django one, make use of this function that I believe it implies that if there's no active span it should extract the context from the carrier, attach and then creates a span.

As another reference, the kafka client does the context.attach after creating the span here.

Unless I'm misunderstanding your concerns and there is a better way to achieve this.

I guess I have taken the example too literally :)
if you see the kafka example they push the new context only for the sake of opentelemetry tracing and after the callback has been called it is detached. Same for the helper used by the web frameworks. You have to store the token and do the detach here too.

@malcolmrebughini
Copy link
Contributor Author

@xrmx Yes, I realized that too. I pushed a new commit that adds the detache of the context and moved the PR to draft for now until I write some tests for the new functions.

Should have it ready tonight.

@malcolmrebughini malcolmrebughini marked this pull request as ready for review April 4, 2024 06:15
@malcolmrebughini
Copy link
Contributor Author

@xrmx Thanks for your patience. This should be good for review now.

Copy link
Contributor

@xrmx xrmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@malcolmrebughini docker-tests are broken.

@malcolmrebughini
Copy link
Contributor Author

@xrmx I'm unable to run these test locally to debug. Any pointers?

I run tox -e docker-tests

 error: subprocess-exited-with-error
  
  × python setup.py egg_info did not run successfully.
  │ exit code: 1
  ╰─> [18 lines of output]
      /bin/sh: line 1: mysql_config: command not found
      /bin/sh: line 1: mariadb_config: command not found
      /bin/sh: line 1: mysql_config: command not found
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-n07bmqnv/mysqlclient_43431da1ffdb4512b0e26ed321ec0aa5/setup.py", line 15, in <module>
          metadata, options = get_config()
                              ^^^^^^^^^^^^
        File "/tmp/pip-install-n07bmqnv/mysqlclient_43431da1ffdb4512b0e26ed321ec0aa5/setup_posix.py", line 70, in get_config
          libs = mysql_config("libs")
                 ^^^^^^^^^^^^^^^^^^^^
        File "/tmp/pip-install-n07bmqnv/mysqlclient_43431da1ffdb4512b0e26ed321ec0aa5/setup_posix.py", line 31, in mysql_config
          raise OSError("{} not found".format(_mysql_config_path))
      OSError: mysql_config not found
      mysql_config --version
      mariadb_config --version
      mysql_config --libs
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I have libpq and snappy as indicated in the contributing instructions.

@xrmx
Copy link
Contributor

xrmx commented Apr 27, 2024

You need mysql_config or mariadb_config that are usually shipped with development libraries of the database, e.g. libmariadb-dev.

@malcolmrebughini
Copy link
Contributor Author

@xrmx hi again. I've been away for a bit. I was able to fix the docker tests on my latest commit. Please take another look.

@malcolmrebughini
Copy link
Contributor Author

@ocelotl @lzchen Is there more feedback for this PR? I would like to know what is the path forward for this change.

Thanks 🙏

@rbagd
Copy link
Contributor

rbagd commented Jul 12, 2024

I've tested this PR in my local environment and it seems to be working quite nicely. I noticed that context doesn't get propagated in task_postrun signal but not sure if that would even be easy to change.

It'd be great if maintainers could look into the PR before the next release.

@malcolmrebughini
Copy link
Contributor Author

@rbagd thanks for trying it out.

Can you provide more details on what you mean that the context doesn't get propagated in task_postrun? If there's an issue I have not noticed with this instrumentation I would like to fix it.

CHANGELOG.md Outdated Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
malcolmrebughini added a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Jul 14, 2024
@rbagd
Copy link
Contributor

rbagd commented Jul 29, 2024

@malcolmrebughini So from what I understood Celery instrumentation registers a task_postrun signal which detaches the OTel context. Since application code might have its own task_postrun signals, then the ordering starts to matter. If you run the instrumentation via opentelemetry-instrument celery, context detachment seems to be executed first and so the remaining task_postrun handlers don't get the OTel context anymore. This should not be an issue with other signals as far as I can say, albeit I didn't test all of them.

I didn't see an obvious way to change the ordering of task_postrun handlers but if there's a way, that might be a good work-around.

P.S. This is not specific to this PR, it must have been there from the start.

@xrmx
Copy link
Contributor

xrmx commented Jul 29, 2024

@malcolmrebughini please rebase and move the changelog update to the unreleased section

@malcolmrebughini malcolmrebughini force-pushed the fix/opentelemetry-instrumentation-celery-baggage branch from 65ceb1a to 59f63fc Compare July 30, 2024 02:45
malcolmrebughini added a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Jul 30, 2024
malcolmrebughini added a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Jul 30, 2024
@malcolmrebughini
Copy link
Contributor Author

@xrmx done with the rebase and added the changelog entry to unreleased 👌

ocelotl pushed a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Aug 1, 2024
ocelotl pushed a commit to malcolmrebughini/opentelemetry-python-contrib that referenced this pull request Aug 1, 2024
@ocelotl ocelotl force-pushed the fix/opentelemetry-instrumentation-celery-baggage branch from e858553 to 109a04f Compare August 1, 2024 21:24
@ocelotl ocelotl force-pushed the fix/opentelemetry-instrumentation-celery-baggage branch from 109a04f to 39482d4 Compare August 1, 2024 21:27
@ocelotl ocelotl merged commit c3e9f75 into open-telemetry:main Aug 1, 2024
389 checks passed
self.update_task_duration_time(task_id)
labels = {"task": task.name, "worker": task.request.hostname}
self._record_histograms(task_id, labels)
context_api.detach(token)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this line introduces a bug where the token could be "None" if the prerun signal has no traceparent.

This could happen for example if the process sending the task isn't instrumented, while the celery worker is instrumented. This results in an error since token cannot be None when calling context_api.detach()

This seems related as well: open-telemetry/opentelemetry-python#4163

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was a silly mistake on my end. I'll get a fix for this tonight.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

started a draft PR here: #2855

still trying to figure out how to write a test for this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If an integration test is hard to write maybe an unit test could help in moving things forward

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Celery instrument didn't propagate baggage
5 participants