Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: By using const over static, improve noop span creation performance another 10% #1270

Merged
merged 1 commit into from
Sep 21, 2023

Conversation

shaun-cox
Copy link
Contributor

@shaun-cox shaun-cox commented Sep 13, 2023

Improves performance for span creation when the SDK is not configured another 10%.

(Depends on #1268 to go first.)

new_span/if_parent_sampled/no-sdk/spec
                        time:   [83.124 ns 83.224 ns 83.312 ns]
                        thrpt:  [12.003 Melem/s 12.016 Melem/s 12.030 Melem/s]
                 change:
                        time:   [-9.3988% -9.2273% -9.0668%] (p = 0.00 < 0.05)
                        thrpt:  [+9.9709% +10.165% +10.374%]
                        Performance has improved.

Changes

  • remove branch in build_with_context for the noop tracer. the current context's span() method already returns &NOOP_SPAN in the case when no active span is present, which already has the right span context value to use.
  • changed NOOP_SPAN from a lazy static to a const
  • changed other constituents of span context to use const default values too.

Merge requirement checklist

  • CONTRIBUTING guidelines followed
  • Unit tests added/updated (if applicable)
  • Appropriate CHANGELOG.md files updated for non-trivial, user-facing changes
  • Changes in public API reviewed (if applicable)

@shaun-cox shaun-cox changed the title Prefer const Performance: By using const over static, improve noop span creation performance another 10% Sep 13, 2023
@codecov
Copy link

codecov bot commented Sep 13, 2023

Codecov Report

Patch coverage is 100.0% of modified lines.

Files Changed Coverage
opentelemetry/src/trace/context.rs ø
opentelemetry/src/trace/noop.rs 100.0%
opentelemetry/src/trace/span_context.rs 100.0%

📢 Thoughts on this report? Let us know!.

@shaun-cox shaun-cox force-pushed the prefer_const branch 3 times, most recently from de72f74 to ee342ac Compare September 14, 2023 13:45
/// Create an invalid empty span context
pub fn empty_context() -> Self {
SpanContext::new(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of places using empty_context() that can be replaced with SpanContext::NONE. I haven't changed them in this PR, but my plan would be to change them all in a future PR and then remove the empty_context() function. Anyone see any problems with that approach?

@shaun-cox shaun-cox marked this pull request as ready for review September 14, 2023 14:37
@shaun-cox shaun-cox requested a review from a team September 14, 2023 14:37
Copy link
Contributor

@djc djc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

opentelemetry/src/trace/noop.rs Show resolved Hide resolved
@@ -20,6 +20,14 @@ use thiserror::Error;
pub struct TraceFlags(u8);

impl TraceFlags {
/// Trace flags with the `sampled` flag set to `0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Style suggestion: I usually like to put consts near the bottom instead of near the top, since they're usually relatively simple and don't change often.

opentelemetry/src/trace/noop.rs Outdated Show resolved Hide resolved
- Changed NOOP_SPAN from a lazy static to a const
- Remove branch in build_with_context for the noop tracer.
  The current context's span() method already returns &NOOP_SPAN in the
  case when no active span is present, which already has the right span
  context value to use.
- Changed other constituents of span context to use const default values too.
@lalitb
Copy link
Member

lalitb commented Sep 18, 2023

Just to understand more, the performance improvement here is for the first call to NoopTracer::build_with_span(), when the static NOOP_SPAN (in existing code) is initialized. As the subsequent calls to NoopTracer::build_with_span() will utilize this already created NOOP_SPAN.

@shaun-cox
Copy link
Contributor Author

Just to understand more, the performance improvement here is for the first call to NoopTracer::build_with_span(), when the static NOOP_SPAN (in existing code) is initialized. As the subsequent calls to NoopTracer::build_with_span() will utilize this already created NOOP_SPAN.

Good question. It's not so much about optimizing the first call which accesses a lazy static (to save the initialization), its more about optimizing every call to the lazy static which has to check that that initialization has been done before proceeding. When it's a constant, there is nothing to initialize, so there is no need for a call to check that it has been initialized.

The generated assembly for this sample code should make it clearer.

example::access_static:
        push    rax
        lea     rdi, [rip + example::STATIC_THING]
        call    qword ptr [rip + <once_cell::sync::Lazy<T,F> as core::ops::deref::Deref>::deref@GOTPCREL]
        pop     rcx
        ret

example::access_const:
        lea     rax, [rip + .L__unnamed_20]
        ret

@shaun-cox
Copy link
Contributor Author

Oh, and I should have set the optimization flag for the compiler for release, above. When I do that, the results are even more obvious. :)

example::access_static:
        push    rax
        lea     rax, [rip + example::STATIC_THING]
        mov     qword ptr [rsp], rax
        movzx   eax, byte ptr [rip + example::STATIC_THING+24]
        cmp     al, 2
        jne     .LBB11_4
.LBB11_1:
        movzx   eax, byte ptr [rip + example::STATIC_THING+24]
        cmp     al, 2
        jne     .LBB11_7
        cmp     qword ptr [rip + example::STATIC_THING], 0
        je      .LBB11_3
        lea     rax, [rip + example::STATIC_THING+8]
        pop     rcx
        ret
.LBB11_4:
        mov     rdi, rsp
        call    once_cell::imp::OnceCell<T>::initialize
        movzx   eax, byte ptr [rip + example::STATIC_THING+24]
        cmp     al, 2
        je      .LBB11_1
        lea     rdi, [rip + .L__unnamed_11]
        lea     rdx, [rip + .L__unnamed_12]
        mov     esi, 41
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
.LBB11_7:
        lea     rdi, [rip + .L__unnamed_13]
        lea     rdx, [rip + .L__unnamed_14]
        mov     esi, 39
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2
.LBB11_3:
        lea     rdi, [rip + .L__unnamed_5]
        lea     rdx, [rip + .L__unnamed_15]
        mov     esi, 23
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

example::STATIC_THING:
        .zero   8
        .zero   16
        .zero   1
        .zero   7
        .quad   example::Thing::new

vs

example::access_const:
        lea     rax, [rip + .L__unnamed_16]
        ret

.L__unnamed_16:
        .zero   16

@lalitb
Copy link
Member

lalitb commented Sep 19, 2023

The generated assembly for this sample code should make it clearer.

Thanks, the assembly makes it much clear.

@TommyCpp TommyCpp merged commit f7684b0 into open-telemetry:main Sep 21, 2023
12 checks passed
@shaun-cox shaun-cox deleted the prefer_const branch September 21, 2023 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants