Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TraceQL performance improvement: dynamic reordering of binop branches #4163

Merged
merged 3 commits into from
Oct 11, 2024

Conversation

mdisibio
Copy link
Contributor

@mdisibio mdisibio commented Oct 7, 2024

What this PR does:
This PR builds on #4114 which introduced short-circuiting boolean expressions like true || false. Now we are going a step further and dynamically profiling and reordering statements at runtime to be choose the fastest path first. Consider a statement like trueSlow || trueFast. It's better to put trueFast first because it will allow us to return faster, and skip processing the trueSlow expression.

This PR is the result of digging even more into the particularly tough query run internally mentioned in #4114 .

The way it works is by not short-circuiting for the first 1000 executions, and instead measuring the runtime and usefulness of each branch. Then it checks whether the right-hand-side is the better choice and then flips the expression. trueSlow || trueFast becomes trueFast || trueSlow

1000 is derived experimentally. It's enough to reliably help but without adding too much overhead. In the benchmarks binary operations are run millions of times.

Benchmarks:
updated after faaee88

There is a very significant improvement to the tough query of +26%. However there is a small regression to already optimally-written queries, because we profile the branches but there is nothing to do.

name                         old time/op    new time/op    delta
BackendBlockTraceQL/special     1.87s ± 6%     1.48s ± 1%  -20.81%  (p=0.004 n=6+5)

name                         old speed      new speed      delta
BackendBlockTraceQL/special  12.6MB/s ± 6%  15.9MB/s ± 1%  +26.16%  (p=0.004 n=6+5)

name                         old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/special      23.4 ± 0%      23.4 ± 0%     ~     (all equal)

name                         old alloc/op   new alloc/op   delta
BackendBlockTraceQL/special    36.0MB ± 0%    36.1MB ± 0%     ~     (p=0.095 n=5+5)

name                         old allocs/op  new allocs/op  delta
BackendBlockTraceQL/special      645k ± 0%      645k ± 0%   +0.02%  (p=0.008 n=5+5)
name                                              old time/op    new time/op    delta
BackendBlockTraceQL/spanAttValMatch                  259ms ± 4%     251ms ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/spanAttValNoMatch               6.99ms ± 1%    6.86ms ± 0%  -1.85%  (p=0.004 n=6+5)
BackendBlockTraceQL/spanAttIntrinsicMatch            131ms ± 2%     135ms ± 7%    ~     (p=0.126 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         7.01ms ± 2%    7.00ms ± 1%    ~     (p=0.792 n=5+6)
BackendBlockTraceQL/resourceAttValMatch              1.20s ± 1%     1.24s ± 4%  +3.80%  (p=0.004 n=5+6)
BackendBlockTraceQL/resourceAttValNoMatch           6.71ms ± 1%    6.71ms ± 1%    ~     (p=0.937 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch       59.4ms ± 2%    58.9ms ± 1%    ~     (p=0.093 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    6.76ms ± 1%    6.71ms ± 1%    ~     (p=0.065 n=6+6)
BackendBlockTraceQL/traceOrMatch                     530ms ± 2%     545ms ± 0%  +2.95%  (p=0.004 n=6+5)
BackendBlockTraceQL/traceOrNoMatch                   533ms ± 1%     552ms ± 1%  +3.60%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                  407ms ± 1%     411ms ± 0%  +0.78%  (p=0.017 n=6+5)
BackendBlockTraceQL/mixedValMixedMatchAnd           7.37ms ± 1%    7.40ms ± 1%    ~     (p=0.537 n=5+6)
BackendBlockTraceQL/mixedValMixedMatchOr             350ms ± 1%     348ms ± 1%    ~     (p=0.537 n=6+5)
BackendBlockTraceQL/count                            980ms ± 2%     974ms ± 1%    ~     (p=0.537 n=6+5)
BackendBlockTraceQL/struct                           1.04s ± 4%     1.02s ± 1%    ~     (p=0.247 n=6+5)
BackendBlockTraceQL/||                               386ms ± 4%     381ms ± 1%    ~     (p=0.931 n=6+5)
BackendBlockTraceQL/mixed                           50.5ms ± 1%    51.8ms ± 5%    ~     (p=0.180 n=6+6)
BackendBlockTraceQL/complex                         7.51ms ± 0%    7.55ms ± 1%    ~     (p=0.126 n=6+5)

name                                              old speed      new speed      delta
BackendBlockTraceQL/spanAttValMatch                105MB/s ± 4%   109MB/s ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/spanAttValNoMatch              220MB/s ± 1%   224MB/s ± 0%  +1.89%  (p=0.004 n=6+5)
BackendBlockTraceQL/spanAttIntrinsicMatch          216MB/s ± 2%   210MB/s ± 7%    ~     (p=0.126 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch        331MB/s ± 2%   332MB/s ± 1%    ~     (p=0.831 n=5+6)
BackendBlockTraceQL/resourceAttValMatch           22.3MB/s ± 1%  21.5MB/s ± 4%  -3.59%  (p=0.000 n=5+6)
BackendBlockTraceQL/resourceAttValNoMatch         66.3MB/s ± 1%  66.3MB/s ± 1%    ~     (p=0.937 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch      451MB/s ± 2%   456MB/s ± 1%    ~     (p=0.093 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01  74.7MB/s ± 1%  75.2MB/s ± 1%    ~     (p=0.065 n=6+6)
BackendBlockTraceQL/traceOrMatch                  2.67MB/s ± 2%  2.59MB/s ± 0%  -2.97%  (p=0.004 n=6+5)
BackendBlockTraceQL/traceOrNoMatch                2.65MB/s ± 1%  2.56MB/s ± 1%  -3.45%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValNoMatch               5.67MB/s ± 1%  5.63MB/s ± 0%  -0.79%  (p=0.009 n=6+5)
BackendBlockTraceQL/mixedValMixedMatchAnd          118MB/s ± 1%   118MB/s ± 1%    ~     (p=0.459 n=5+6)
BackendBlockTraceQL/mixedValMixedMatchOr          43.9MB/s ± 1%  44.2MB/s ± 1%    ~     (p=0.571 n=6+5)
BackendBlockTraceQL/count                         27.3MB/s ± 2%  27.4MB/s ± 2%    ~     (p=0.818 n=6+6)
BackendBlockTraceQL/struct                        6.12MB/s ± 4%  6.24MB/s ± 1%    ~     (p=0.229 n=6+5)
BackendBlockTraceQL/||                            69.8MB/s ± 4%  70.6MB/s ± 1%    ~     (p=0.931 n=6+5)
BackendBlockTraceQL/mixed                          514MB/s ± 1%   502MB/s ± 4%    ~     (p=0.180 n=6+6)
BackendBlockTraceQL/complex                       61.2MB/s ± 0%  60.8MB/s ± 1%    ~     (p=0.126 n=6+5)

name                                              old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/spanAttValMatch                   27.3 ± 0%      27.3 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttValNoMatch                 1.54 ± 0%      1.54 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicMatch             28.3 ± 0%      28.3 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicNoMatch           2.32 ± 0%      2.32 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValMatch               26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValNoMatch             0.44 ± 0%      0.44 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch         26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01      0.50 ± 0%      0.50 ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrMatch                      1.42 ± 0%      1.42 ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrNoMatch                    1.42 ± 0%      1.42 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValNoMatch                   2.31 ± 0%      2.31 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd             0.87 ± 0%      0.87 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr              15.4 ± 0%      15.4 ± 0%    ~     (all equal)
BackendBlockTraceQL/count                             26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/struct                            6.36 ± 0%      6.36 ± 0%    ~     (all equal)
BackendBlockTraceQL/||                                26.9 ± 0%      26.9 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixed                             26.0 ± 0%      26.0 ± 0%    ~     (all equal)
BackendBlockTraceQL/complex                           0.46 ± 0%      0.46 ± 0%    ~     (all equal)

name                                              old alloc/op   new alloc/op   delta
BackendBlockTraceQL/spanAttValMatch                 58.9MB ± 0%    59.0MB ± 0%    ~     (p=0.485 n=6+6)
BackendBlockTraceQL/spanAttValNoMatch               2.72MB ± 0%    2.72MB ± 0%    ~     (p=0.792 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicMatch           31.8MB ± 1%    32.0MB ± 1%  +0.74%  (p=0.026 n=6+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         2.74MB ± 0%    2.74MB ± 0%    ~     (p=0.662 n=6+5)
BackendBlockTraceQL/resourceAttValMatch              702MB ± 2%     697MB ± 0%    ~     (p=1.000 n=6+5)
BackendBlockTraceQL/resourceAttValNoMatch           2.70MB ± 0%    2.70MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch       7.42MB ± 0%    7.41MB ± 0%    ~     (p=0.132 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    2.70MB ± 0%    2.70MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/traceOrMatch                    3.57MB ± 4%    3.57MB ± 4%    ~     (p=0.690 n=5+5)
BackendBlockTraceQL/traceOrNoMatch                  3.47MB ± 1%    3.47MB ± 1%    ~     (p=0.699 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                 2.77MB ± 0%    2.77MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchAnd           2.70MB ± 0%    2.70MB ± 0%  +0.01%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchOr            3.11MB ± 0%    3.11MB ± 0%  +0.01%  (p=0.002 n=6+6)
BackendBlockTraceQL/count                            506MB ± 0%     507MB ± 0%  +0.18%  (p=0.041 n=6+6)
BackendBlockTraceQL/struct                          8.27MB ± 4%    8.10MB ± 2%    ~     (p=0.177 n=6+5)
BackendBlockTraceQL/||                              29.0MB ± 0%    29.0MB ± 0%    ~     (p=0.278 n=5+5)
BackendBlockTraceQL/mixed                           4.22MB ± 0%    4.22MB ± 0%  +0.09%  (p=0.041 n=6+6)
BackendBlockTraceQL/complex                         2.73MB ± 0%    2.73MB ± 0%  +0.04%  (p=0.008 n=5+5)

name                                              old allocs/op  new allocs/op  delta
BackendBlockTraceQL/spanAttValMatch                   635k ± 0%      635k ± 0%    ~     (p=0.784 n=6+6)
BackendBlockTraceQL/spanAttValNoMatch                43.3k ± 0%     43.3k ± 0%  -0.00%  (p=0.030 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicMatch             228k ± 0%      228k ± 0%    ~     (p=0.061 n=6+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch          43.3k ± 0%     43.3k ± 0%    ~     (p=1.000 n=6+6)
BackendBlockTraceQL/resourceAttValMatch              4.12M ± 0%     4.12M ± 0%    ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch            43.3k ± 0%     43.3k ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch        91.6k ± 0%     91.6k ± 0%    ~     (p=0.273 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01     43.3k ± 0%     43.3k ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrMatch                     53.5k ± 2%     53.5k ± 2%    ~     (p=0.690 n=5+5)
BackendBlockTraceQL/traceOrNoMatch                   52.9k ± 0%     52.9k ± 0%    ~     (p=0.699 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                  44.0k ± 0%     44.0k ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd            43.4k ± 0%     43.4k ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchOr             45.3k ± 0%     45.3k ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/count                            2.40M ± 0%     2.40M ± 0%    ~     (p=0.177 n=6+5)
BackendBlockTraceQL/struct                           65.0k ± 7%     63.2k ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/||                                210k ± 0%      210k ± 0%    ~     (p=0.881 n=5+5)
BackendBlockTraceQL/mixed                            56.0k ± 0%     56.0k ± 0%  +0.01%  (p=0.004 n=6+5)
BackendBlockTraceQL/complex                          43.6k ± 0%     43.6k ± 0%  +0.02%  (p=0.004 n=5+6)

Which issue(s) this PR fixes:
Fixes #

Checklist

  • Tests updated
  • Documentation added
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

@@ -325,30 +325,50 @@ func (a Aggregate) evaluate(input []*Spanset) (output []*Spanset, err error) {
}

func (o *BinaryOperation) execute(span Span) (Static, error) {
recording := o.b.Recording
if recording {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is crafted to stay out of the hot-path as much as possible. There may be a way to improve it even further. However I can say that previously pulling LHS.execute into a function pointer made things slower.

switch o.Op {
case OpAnd:
if recording {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These penalties are probably the only tricky part to this approach. Happy to add more detail if things aren't clear.

pkg/traceql/ast_execute.go Outdated Show resolved Hide resolved
@javiermolinar javiermolinar changed the title TraceQL performance improvment: dynamic reordering of binop branches TraceQL performance improvement: dynamic reordering of binop branches Oct 8, 2024
Copy link
Contributor

@javiermolinar javiermolinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love it!

pkg/traceql/ast_execute.go Outdated Show resolved Hide resolved
}

// Start recording. Should be called immediately prior to a branch execution.
func (b *branchOptimizer) Start() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about StartRecording and StopRecording?


// Start recording. Should be called immediately prior to a branch execution.
func (b *branchOptimizer) Start() {
b.start = time.Now()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we use the same last to track the start value? that way we don't need to share that variable:

b.last[branch] = time.Now()

and in the Finish method:

b.last[branch] = time.Since(b.last[branch])

We can rename last to be more accurate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting idea. The variable types are different, time.Time vs time.Duration, will have to see if the overhead of getting unix nanoseconds is less than the var.

pkg/traceql/ast_execute.go Outdated Show resolved Hide resolved
pkg/traceql/util.go Show resolved Hide resolved
pkg/traceql/util.go Show resolved Hide resolved
Copy link
Contributor

@javiermolinar javiermolinar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mdisibio mdisibio merged commit 3df4a7a into grafana:main Oct 11, 2024
16 checks passed
knylander-grafana pushed a commit to knylander-grafana/tempo-doc-work that referenced this pull request Oct 11, 2024
…grafana#4163)

* Dynamic reordering of binop branches

* consts and cleanup

* changelog
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants