Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement shutdown procedure for OTLP grpc exporters #3138

Merged

Conversation

girishc13
Copy link
Contributor

@girishc13 girishc13 commented Jan 23, 2023

Description

Partial fix for #1791. Implements the shutdown procedure for OTLP gRPC exporters.

  • Add _shutdown variable for checking if the exporter has been shutdown.
  • Prevent export if the _shutdown flag has been set. Log a warning message is exporter has been shutdown.
  • Use thread lock to synchronize the last export call before shutdown timeout. The shutdown method will wait until the timeout_millis if there is an ongoing export. If there is no ongiong export, set the _shutdown flag to prevent further exports and return.
  • Add unit tests for the OTLPExporterMixIn and the sub classes for traces and metrics.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update

How Has This Been Tested?

Please describe the tests that you ran to verify your changes. Provide instructions so we can reproduce. Please also list any relevant details for your test configuration

  • TestOTLPExporterMixin
    • test_shutdown: assert that the further exports don't happen after calling the shutdown method.
    • test_shutdown_wait_last_export: assert that the shutdown method waits for at least timeout_millis time if an export is in progress before setting the _shutdown flag to prevent further exports.
  • TestOTLPMetricExporter and TestOTLPTracesExporter
    • same as above but uses the concrete sub classes with the available stubs.

Does This PR Require a Contrib Repo Change?

Answer the following question based on these examples of changes that would require a Contrib Repo Change:

  • The OTel specification has changed which prompted this PR to update the method interfaces of opentelemetry-api/ or opentelemetry-sdk/

  • The method interfaces of test/util have changed

  • Scripts in scripts/ that were copied over to the Contrib repo have changed

  • Configuration files that were copied over to the Contrib repo have changed (when consistency between repositories is applicable) such as in

    • pyproject.toml
    • isort.cfg
    • .flake8
  • When a new .github/CODEOWNER is added

  • Major changes to project information, such as in:

    • README.md
    • CONTRIBUTING.md
  • Yes. - Link to PR:

  • No.

Checklist:

  • [?] Followed the style guidelines of this project
    • I cannot run tox -e lint on my machine due to some issue with the distutils module. I have manually run black and isort commands but there are some changes which might be questionable. I'll try to fix the distutils package issues.
  • Changelogs have been updated
  • Unit tests have been added
  • Documentation has been updated

Open Question

The OTLP gRPC exporters run in a daemon thread with an exponential backoff of max 64 seconds. This is questionable because this often leaves a hanging thread even if the main process has been shutdown. Should the export function stop after the shutdown flag has been set even if the retry is in progress?

- Add `_shutdown` variable for checking if the exporter has been
  shutdown.
- Prevent export if the `_shutdown` flag has been set. Log a warning
  message is exporter has been shutdown.
- Use thread lock to synchronize the last export call before shutdown
  timeout. The `shutdown` method will wait until the `timeout_millis`
  if there is an ongoing export. If there is no ongiong export, set the
  `_shutdown` flag to prevent further exports and return.
- Add unit tests for the `OTLPExporterMixIn` and the sub classes for
  traces and metrics.
@linux-foundation-easycla
Copy link

linux-foundation-easycla bot commented Jan 23, 2023

CLA Signed

The committers listed above are authorized under a signed CLA.

@girishc13 girishc13 marked this pull request as ready for review January 23, 2023 14:32
@girishc13 girishc13 requested a review from a team January 23, 2023 14:32
@girishc13
Copy link
Contributor Author

@srikanthccv Any opinions on the open question in the PR description? After more investigation I found out that if the code doesn't end the exponential back off in the OTLPExporterMixin.export method even if the shutdown flag has been set.

@srikanthccv
Copy link
Member

Should the export function stop after the shutdown flag has been set even if the retry is in progress?

As soon as the shutdown request is received, it should do one last flush before exiting. IIRC there was no reliable way to cancel the ongoing export call and just try once before shutting down everything. I haven't looked at the implementation but if you have some ideas please suggest them.

@girishc13
Copy link
Contributor Author

Combining the last flush with the shutdown process can make the logic a bit more complicated. If nobody is complaining then let's leave it as it is. Otherwise we would need to change the self._shutdown flag to a threading.Event object which signals the thread executing the export. This signal can be used to stop the retry logic in the OTLPExporterMixin.export method earlier.

@srikanthccv
Copy link
Member

Please fix the lint

@srikanthccv srikanthccv enabled auto-merge (squash) March 14, 2023 18:27
@srikanthccv srikanthccv merged commit c84ba94 into open-telemetry:main Mar 14, 2023
@srikanthccv
Copy link
Member

Thank you!

@girishc13 girishc13 deleted the fix-1791-otlp-grpc-exporter-shutdown branch March 15, 2023 08:11
@@ -290,6 +290,9 @@ def _translate_data(
def export(self, spans: Sequence[ReadableSpan]) -> SpanExportResult:
return self._export(spans)

def shutdown(self) -> None:
OTLPExporterMixin.shutdown(self)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@girishc13 Can you please help me understand why you are calling shutdown on mixin class directly instead of using super()?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember the exact details but the shutdown method is implemented by both the OTLPExporterMixin and the SpanExporter interfaces. The exporter.shutdown is handled by different logic and this pr was targeting the backend client that needs to be shutdown. You need to trace the calls for shutdown from the usage of the OTLPSpanExporter.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the issue maybe due to mixin inheritance being applied incorrectly, currently we have

class OTLPSpanExporter(SpanExporter, OTLPExporterMixin[...]):

but usually in python mixin should come before the base class, e.g.

class OTLPSpanExporter(OTLPExporterMixin[...], SpanExporter):

this way, super().shutdown() call will correctly use shutdown method from mixin

@Liubey
Copy link

Liubey commented Mar 8, 2024

@girishc13 @lzchen @srikanthccv
I found a dead lock of grpc.OTLPSpanExporter with uwsgi parameter "max-requests", see that-->#3640
I think the key is _export_lock

why in grpc verision, use _export_lock, and why http.OTLPSpanExporter not need the lock flag?

@srikanthccv
Copy link
Member

Hi @Liubey, either there was no HTTP exporter at the time or the author only addressed the issue for the gRPC exporter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants