Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat add dd and otel celery tasks #409

Closed
wants to merge 8 commits into from

Conversation

connorhaugh
Copy link

@connorhaugh connorhaugh commented Apr 29, 2024

Description:

Adds an initialize_celery_monitoring util to allow celery tasks to be monitored by the Open Telemetry and Datadog providers. edx-platform PR to follow.

JIRA:

edx/edx-arch-experiments#584

Testing instructions:

  1. Set this package to be installed locally, check out this PR
    1. Run your edx-platform off this branch: feat: add LMS/CMS monitoring to celery edx-platform#34673 to use the implementation required from the package.
  2. set your OPENEDX_TELEMETRY django setting to be [edx_django_utils.monitoring.DataDogBackend]
  3. follow https://2u-internal.atlassian.net/wiki/spaces/~840928901/pages/813793298/OpenTelemetry+New+Relic+and+Datadog for devstack setup for DD, but set up the DD agent on CMS, not lms
  4. Follow https://2u-internal.atlassian.net/wiki/spaces/ENGAGE/pages/395673715/How+to+configure+Celery+worker+on+LMS+CMS+Devstack to set up celery workers on cms.
  5. Export a course
  6. View the export on DD with a celery task

Reviewers:

  • tag reviewer

Merge checklist:

  • All reviewers approved
  • CI build is green
  • Version bumped
  • Changelog record added
  • Documentation updated (not only docstrings)
  • Commits are squashed

Post merge:

  • Create a tag
  • Check new version is pushed to PyPi after tag-triggered build is
    finished.
  • Delete working branch (if not needed anymore)

Author concerns:

List any concerns about this PR - inelegant
solutions, hacks, quick-and-dirty implementations, concerns about
migrations, etc.

timmc-edx and others added 4 commits March 28, 2024 18:29
Switch to pluggable design so that some monitoring functions can
report to something other than New Relic. Still defaults to just New
Relic, but new setting allows adding OpenTelemetry or Datadog, or
removing New Relic.

Initialization and configuration of OpenTelemetry is left as an
exercise to the deployer, but
https://github.com/mitodl/open-edx-plugins/tree/main/src/ol_openedx_otel_monitoring/
would be a likely candidate.

Part of #389
- Add telemetry dependencies to test.in (including newrelic, even though
  that's also in base.in -- make it explicit) and run `make upgrade`.
- Remove TestingBackend and test that the appropriate backend calls are
  being made.
- Move `test_record_exception` from custom-monitoring to backends tests
  and add Datadog coverage
- Add a similar test for fan-out of `set_custom_attribute`
# Conflicts:
#	CHANGELOG.rst
#	requirements/base.txt
#	requirements/dev.txt
#	requirements/doc.txt
#	requirements/quality.txt
#	requirements/test.txt
@connorhaugh connorhaugh changed the base branch from master to timmc/otel April 29, 2024 18:32
Base automatically changed from timmc/otel to master April 30, 2024 15:14
@connorhaugh connorhaugh marked this pull request as ready for review April 30, 2024 17:39
Copy link

@schenedx schenedx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -129,6 +156,9 @@ def record_exception(self):
if span := self.dd_tracer.current_span():
span.set_traceback()

def initialize_celery_monitoring(self, *args, **kwargs):
self.patch(celery=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like ddtrace patches Celery by default: https://github.com/DataDog/dd-trace-py/blob/01fbf9127180794392cc7dbe16acd610087dacf6/ddtrace/_monkey.py#L33 -- were you seeing otherwise?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just going by these docs: https://ddtrace.readthedocs.io/en/stable/integrations.html#celery.
This was in the ticket I had created: edx/edx-arch-experiments#584
Maybe it was inaccurate?

Separately: In those docs I see that trace headers are off by default. We should find out if that is what we want from DD? Maybe long running tasks would kill our traces? Not sure. @connorhaugh: Can you task/track this?

self.instrumentor().instrument()
else:
raise Exception(
"the worker_process_init celery signal must be provided for OpenTelemetry to monitor celery tasks."
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If OpenTelemetry needs it, then it will have to be passed to all the backends, because there will only be one call site for the initialize_celery_monitoring API call we expose. However, it looks like it doesn't need to be passed in at all—this is just a signal, so we could from celery.signals import worker_process_init.

For that matter, it might make the most sense to instead say that the caller should do that, like so:

@celery.signals.worker_process_init.connect(weak=False)
def init_celery_tracing(*args, **kwargs):
    edx_django_utils.monitoring.initialize_celery_monitoring()

We could also just leave the OpenTelemetry side of this unimplemented for now, since we won't be able to give it the testing it deserves (in production).

@connorhaugh
Copy link
Author

Based on Tim's comment, I went and looked into if we were getting what we needed from celery by default.

To do that, I looked at the logs over the past month.

Do the logs related to celery increase or decrease in volume when we enabled DD in edx-platform?
The answer is it increased in volume from april 22-27rd, which was the window for edx-platform having celery on in dd. I have been really struggling to explain why that is the case, however, given the code that enables the patch by default. Either way, I'm making improvements to the code and we should maybe plan on needing it.

@robrap
Copy link
Contributor

robrap commented Jun 7, 2024

@timmc-edx: FYI: I'm closing this PR. I assume at this point we do not need this.

@robrap robrap closed this Jun 7, 2024
@timmc-edx timmc-edx deleted the feat--add-dd-and-otel-celery-tasks branch June 10, 2024 15:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants