Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add shutdown in TracerProvider #1855

Merged
merged 10 commits into from
Jun 10, 2024
Merged

Conversation

TommyCpp
Copy link
Contributor

@TommyCpp TommyCpp commented Jun 2, 2024

Towards #1801

Changes

  • Add shutdown in TracerProvider.
  • Tracer now takes a TracerProvider instead of Weak<TracerProviderInner>, removing the upgrading cost
  • A span will NOT started when TracerProvider has already shutdown
  • A span will NOT be exportered when TracerProvider has already shutdown
  • Minor fix for LoggerProvider

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

Copy link

codecov bot commented Jun 2, 2024

Codecov Report

Attention: Patch coverage is 97.34513% with 3 lines in your changes missing coverage. Please review.

Project coverage is 74.6%. Comparing base (bc1f94c) to head (d3057ac).

Files Patch % Lines
opentelemetry-sdk/src/trace/provider.rs 97.0% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##            main   #1855     +/-   ##
=======================================
+ Coverage   74.5%   74.6%   +0.1%     
=======================================
  Files        122     122             
  Lines      19859   19952     +93     
=======================================
+ Hits       14807   14902     +95     
+ Misses      5052    5050      -2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@lalitb lalitb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the tests are added.

opentelemetry-sdk/src/trace/provider.rs Outdated Show resolved Hide resolved
opentelemetry-sdk/src/trace/provider.rs Outdated Show resolved Hide resolved
@TommyCpp TommyCpp marked this pull request as ready for review June 9, 2024 23:38
@TommyCpp TommyCpp requested a review from a team June 9, 2024 23:38
Copy link
Member

@cijothomas cijothomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. A changelog entry is recommended, as this is removing a significant perf bottleneck. (i tried stress testing - 8-9M/sec to 20M/sec with this PR which is huge improvement)

Do you plan to follow up with removal of global shutdowns similar to what is done for metrics? #1743

@cijothomas cijothomas merged commit 2ac2b18 into open-telemetry:main Jun 10, 2024
24 checks passed
#[derive(Clone, Debug)]
pub struct TracerProvider {
inner: Arc<TracerProviderInner>,
is_shutdown: Arc<AtomicBool>,
Copy link
Contributor

@utpilla utpilla Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any benefit to creating another Arc pointer just for is_shutdown? Could we reuse the existing Arc pointer inner to also hold is_shutdown inside it?

}
} else {
Err(TraceError::Other(
"tracer provider already shut down".into(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably use Mutex or RwLock instead of atomic variable to track shutdown. With the current setup, if there are two threads that are racing to shutdown the tracerprovider, the thread which fails the compare and swap and immediately come to the else condition. It wouldn't return an error message saying "tracer provider already shut down" without even waiting or verifying that the other thread has indeed completed the shutdown.

Copy link
Member

@lalitb lalitb Jun 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will rephrasing the error to say "tracer provider is already shutting down or has been shut down" be useful here, instead of using the Mutex/RwLock ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on what kind of experience we are targeting. I feel that shutting down a tracer provider does not have to be a perf optimized operation so it's okay to use locks. Shutdown would anyway not be a frequent scenario. I also like the clear status that locking offers about whether the provider is shutdown or not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex/rwlock/atomic - is checked in hot path, so it needs to be performant!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/open-telemetry/opentelemetry-rust/blob/main/opentelemetry-sdk/src/logs/log_processor.rs#L93-L97 Logs.
Looks like we are inconsistent here. so that need to be taken care as well!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The mutex/rwlock/atomic - is checked in hot path, so it needs to be performant!

Got it! In that case, we should make use of Arc<Cell<bool>> to track is_shutdown instead of atomics as that should offer better performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why Arc<Cell<bool>> - this can cause race condition and data corruption right, as it's not thread safe.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad! I thought we could use Mutex in conjunction with Arc<Cell<T>> to save the atomic read operation on hot-path. Rust wouldn't allow that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cell<T> is not Send and Sync so compiler won't allow this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants