TraceQL performance improvement: dynamic reordering of binop branches #4163

mdisibio · 2024-10-07T19:12:20Z

What this PR does:
This PR builds on #4114 which introduced short-circuiting boolean expressions like true || false. Now we are going a step further and dynamically profiling and reordering statements at runtime to be choose the fastest path first. Consider a statement like trueSlow || trueFast. It's better to put trueFast first because it will allow us to return faster, and skip processing the trueSlow expression.

This PR is the result of digging even more into the particularly tough query run internally mentioned in #4114 .

The way it works is by not short-circuiting for the first 1000 executions, and instead measuring the runtime and usefulness of each branch. Then it checks whether the right-hand-side is the better choice and then flips the expression. trueSlow || trueFast becomes trueFast || trueSlow

1000 is derived experimentally. It's enough to reliably help but without adding too much overhead. In the benchmarks binary operations are run millions of times.

Benchmarks:
updated after faaee88

There is a very significant improvement to the tough query of +26%. However there is a small regression to already optimally-written queries, because we profile the branches but there is nothing to do.

name                         old time/op    new time/op    delta
BackendBlockTraceQL/special     1.87s ± 6%     1.48s ± 1%  -20.81%  (p=0.004 n=6+5)

name                         old speed      new speed      delta
BackendBlockTraceQL/special  12.6MB/s ± 6%  15.9MB/s ± 1%  +26.16%  (p=0.004 n=6+5)

name                         old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/special      23.4 ± 0%      23.4 ± 0%     ~     (all equal)

name                         old alloc/op   new alloc/op   delta
BackendBlockTraceQL/special    36.0MB ± 0%    36.1MB ± 0%     ~     (p=0.095 n=5+5)

name                         old allocs/op  new allocs/op  delta
BackendBlockTraceQL/special      645k ± 0%      645k ± 0%   +0.02%  (p=0.008 n=5+5)

name                                              old time/op    new time/op    delta
BackendBlockTraceQL/spanAttValMatch                  259ms ± 4%     251ms ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/spanAttValNoMatch               6.99ms ± 1%    6.86ms ± 0%  -1.85%  (p=0.004 n=6+5)
BackendBlockTraceQL/spanAttIntrinsicMatch            131ms ± 2%     135ms ± 7%    ~     (p=0.126 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         7.01ms ± 2%    7.00ms ± 1%    ~     (p=0.792 n=5+6)
BackendBlockTraceQL/resourceAttValMatch              1.20s ± 1%     1.24s ± 4%  +3.80%  (p=0.004 n=5+6)
BackendBlockTraceQL/resourceAttValNoMatch           6.71ms ± 1%    6.71ms ± 1%    ~     (p=0.937 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch       59.4ms ± 2%    58.9ms ± 1%    ~     (p=0.093 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    6.76ms ± 1%    6.71ms ± 1%    ~     (p=0.065 n=6+6)
BackendBlockTraceQL/traceOrMatch                     530ms ± 2%     545ms ± 0%  +2.95%  (p=0.004 n=6+5)
BackendBlockTraceQL/traceOrNoMatch                   533ms ± 1%     552ms ± 1%  +3.60%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                  407ms ± 1%     411ms ± 0%  +0.78%  (p=0.017 n=6+5)
BackendBlockTraceQL/mixedValMixedMatchAnd           7.37ms ± 1%    7.40ms ± 1%    ~     (p=0.537 n=5+6)
BackendBlockTraceQL/mixedValMixedMatchOr             350ms ± 1%     348ms ± 1%    ~     (p=0.537 n=6+5)
BackendBlockTraceQL/count                            980ms ± 2%     974ms ± 1%    ~     (p=0.537 n=6+5)
BackendBlockTraceQL/struct                           1.04s ± 4%     1.02s ± 1%    ~     (p=0.247 n=6+5)
BackendBlockTraceQL/||                               386ms ± 4%     381ms ± 1%    ~     (p=0.931 n=6+5)
BackendBlockTraceQL/mixed                           50.5ms ± 1%    51.8ms ± 5%    ~     (p=0.180 n=6+6)
BackendBlockTraceQL/complex                         7.51ms ± 0%    7.55ms ± 1%    ~     (p=0.126 n=6+5)

name                                              old speed      new speed      delta
BackendBlockTraceQL/spanAttValMatch                105MB/s ± 4%   109MB/s ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/spanAttValNoMatch              220MB/s ± 1%   224MB/s ± 0%  +1.89%  (p=0.004 n=6+5)
BackendBlockTraceQL/spanAttIntrinsicMatch          216MB/s ± 2%   210MB/s ± 7%    ~     (p=0.126 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch        331MB/s ± 2%   332MB/s ± 1%    ~     (p=0.831 n=5+6)
BackendBlockTraceQL/resourceAttValMatch           22.3MB/s ± 1%  21.5MB/s ± 4%  -3.59%  (p=0.000 n=5+6)
BackendBlockTraceQL/resourceAttValNoMatch         66.3MB/s ± 1%  66.3MB/s ± 1%    ~     (p=0.937 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch      451MB/s ± 2%   456MB/s ± 1%    ~     (p=0.093 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01  74.7MB/s ± 1%  75.2MB/s ± 1%    ~     (p=0.065 n=6+6)
BackendBlockTraceQL/traceOrMatch                  2.67MB/s ± 2%  2.59MB/s ± 0%  -2.97%  (p=0.004 n=6+5)
BackendBlockTraceQL/traceOrNoMatch                2.65MB/s ± 1%  2.56MB/s ± 1%  -3.45%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValNoMatch               5.67MB/s ± 1%  5.63MB/s ± 0%  -0.79%  (p=0.009 n=6+5)
BackendBlockTraceQL/mixedValMixedMatchAnd          118MB/s ± 1%   118MB/s ± 1%    ~     (p=0.459 n=5+6)
BackendBlockTraceQL/mixedValMixedMatchOr          43.9MB/s ± 1%  44.2MB/s ± 1%    ~     (p=0.571 n=6+5)
BackendBlockTraceQL/count                         27.3MB/s ± 2%  27.4MB/s ± 2%    ~     (p=0.818 n=6+6)
BackendBlockTraceQL/struct                        6.12MB/s ± 4%  6.24MB/s ± 1%    ~     (p=0.229 n=6+5)
BackendBlockTraceQL/||                            69.8MB/s ± 4%  70.6MB/s ± 1%    ~     (p=0.931 n=6+5)
BackendBlockTraceQL/mixed                          514MB/s ± 1%   502MB/s ± 4%    ~     (p=0.180 n=6+6)
BackendBlockTraceQL/complex                       61.2MB/s ± 0%  60.8MB/s ± 1%    ~     (p=0.126 n=6+5)

name                                              old MB_io/op   new MB_io/op   delta
BackendBlockTraceQL/spanAttValMatch                   27.3 ± 0%      27.3 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttValNoMatch                 1.54 ± 0%      1.54 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicMatch             28.3 ± 0%      28.3 ± 0%    ~     (all equal)
BackendBlockTraceQL/spanAttIntrinsicNoMatch           2.32 ± 0%      2.32 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValMatch               26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttValNoMatch             0.44 ± 0%      0.44 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch         26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01      0.50 ± 0%      0.50 ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrMatch                      1.42 ± 0%      1.42 ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrNoMatch                    1.42 ± 0%      1.42 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValNoMatch                   2.31 ± 0%      2.31 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd             0.87 ± 0%      0.87 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchOr              15.4 ± 0%      15.4 ± 0%    ~     (all equal)
BackendBlockTraceQL/count                             26.8 ± 0%      26.8 ± 0%    ~     (all equal)
BackendBlockTraceQL/struct                            6.36 ± 0%      6.36 ± 0%    ~     (all equal)
BackendBlockTraceQL/||                                26.9 ± 0%      26.9 ± 0%    ~     (all equal)
BackendBlockTraceQL/mixed                             26.0 ± 0%      26.0 ± 0%    ~     (all equal)
BackendBlockTraceQL/complex                           0.46 ± 0%      0.46 ± 0%    ~     (all equal)

name                                              old alloc/op   new alloc/op   delta
BackendBlockTraceQL/spanAttValMatch                 58.9MB ± 0%    59.0MB ± 0%    ~     (p=0.485 n=6+6)
BackendBlockTraceQL/spanAttValNoMatch               2.72MB ± 0%    2.72MB ± 0%    ~     (p=0.792 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicMatch           31.8MB ± 1%    32.0MB ± 1%  +0.74%  (p=0.026 n=6+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch         2.74MB ± 0%    2.74MB ± 0%    ~     (p=0.662 n=6+5)
BackendBlockTraceQL/resourceAttValMatch              702MB ± 2%     697MB ± 0%    ~     (p=1.000 n=6+5)
BackendBlockTraceQL/resourceAttValNoMatch           2.70MB ± 0%    2.70MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch       7.42MB ± 0%    7.41MB ± 0%    ~     (p=0.132 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01    2.70MB ± 0%    2.70MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/traceOrMatch                    3.57MB ± 4%    3.57MB ± 4%    ~     (p=0.690 n=5+5)
BackendBlockTraceQL/traceOrNoMatch                  3.47MB ± 1%    3.47MB ± 1%    ~     (p=0.699 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                 2.77MB ± 0%    2.77MB ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchAnd           2.70MB ± 0%    2.70MB ± 0%  +0.01%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchOr            3.11MB ± 0%    3.11MB ± 0%  +0.01%  (p=0.002 n=6+6)
BackendBlockTraceQL/count                            506MB ± 0%     507MB ± 0%  +0.18%  (p=0.041 n=6+6)
BackendBlockTraceQL/struct                          8.27MB ± 4%    8.10MB ± 2%    ~     (p=0.177 n=6+5)
BackendBlockTraceQL/||                              29.0MB ± 0%    29.0MB ± 0%    ~     (p=0.278 n=5+5)
BackendBlockTraceQL/mixed                           4.22MB ± 0%    4.22MB ± 0%  +0.09%  (p=0.041 n=6+6)
BackendBlockTraceQL/complex                         2.73MB ± 0%    2.73MB ± 0%  +0.04%  (p=0.008 n=5+5)

name                                              old allocs/op  new allocs/op  delta
BackendBlockTraceQL/spanAttValMatch                   635k ± 0%      635k ± 0%    ~     (p=0.784 n=6+6)
BackendBlockTraceQL/spanAttValNoMatch                43.3k ± 0%     43.3k ± 0%  -0.00%  (p=0.030 n=5+6)
BackendBlockTraceQL/spanAttIntrinsicMatch             228k ± 0%      228k ± 0%    ~     (p=0.061 n=6+6)
BackendBlockTraceQL/spanAttIntrinsicNoMatch          43.3k ± 0%     43.3k ± 0%    ~     (p=1.000 n=6+6)
BackendBlockTraceQL/resourceAttValMatch              4.12M ± 0%     4.12M ± 0%    ~     (p=0.548 n=5+5)
BackendBlockTraceQL/resourceAttValNoMatch            43.3k ± 0%     43.3k ± 0%    ~     (all equal)
BackendBlockTraceQL/resourceAttIntrinsicMatch        91.6k ± 0%     91.6k ± 0%    ~     (p=0.273 n=6+6)
BackendBlockTraceQL/resourceAttIntrinsicMatch#01     43.3k ± 0%     43.3k ± 0%    ~     (all equal)
BackendBlockTraceQL/traceOrMatch                     53.5k ± 2%     53.5k ± 2%    ~     (p=0.690 n=5+5)
BackendBlockTraceQL/traceOrNoMatch                   52.9k ± 0%     52.9k ± 0%    ~     (p=0.699 n=6+6)
BackendBlockTraceQL/mixedValNoMatch                  44.0k ± 0%     44.0k ± 0%    ~     (all equal)
BackendBlockTraceQL/mixedValMixedMatchAnd            43.4k ± 0%     43.4k ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/mixedValMixedMatchOr             45.3k ± 0%     45.3k ± 0%  +0.00%  (p=0.002 n=6+6)
BackendBlockTraceQL/count                            2.40M ± 0%     2.40M ± 0%    ~     (p=0.177 n=6+5)
BackendBlockTraceQL/struct                           65.0k ± 7%     63.2k ± 2%    ~     (p=0.126 n=6+5)
BackendBlockTraceQL/||                                210k ± 0%      210k ± 0%    ~     (p=0.881 n=5+5)
BackendBlockTraceQL/mixed                            56.0k ± 0%     56.0k ± 0%  +0.01%  (p=0.004 n=6+5)
BackendBlockTraceQL/complex                          43.6k ± 0%     43.6k ± 0%  +0.02%  (p=0.004 n=5+6)

Which issue(s) this PR fixes:
Fixes #

Checklist

Tests updated
Documentation added
CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX]

mdisibio · 2024-10-07T19:13:44Z

pkg/traceql/ast_execute.go

@@ -325,30 +325,50 @@ func (a Aggregate) evaluate(input []*Spanset) (output []*Spanset, err error) {
 }

 func (o *BinaryOperation) execute(span Span) (Static, error) {
+	recording := o.b.Recording
+	if recording {


This code is crafted to stay out of the hot-path as much as possible. There may be a way to improve it even further. However I can say that previously pulling LHS.execute into a function pointer made things slower.

pkg/traceql/util.go

mdisibio · 2024-10-07T19:16:06Z

pkg/traceql/ast_execute.go

 		switch o.Op {
 		case OpAnd:
+			if recording {


These penalties are probably the only tricky part to this approach. Happy to add more detail if things aren't clear.

pkg/traceql/ast_execute.go

javiermolinar

Love it!

pkg/traceql/ast_execute.go

javiermolinar · 2024-10-08T08:15:22Z

pkg/traceql/util.go

+}
+
+// Start recording. Should be called immediately prior to a branch execution.
+func (b *branchOptimizer) Start() {


What about StartRecording and StopRecording?

javiermolinar · 2024-10-08T08:17:18Z

pkg/traceql/util.go

+
+// Start recording. Should be called immediately prior to a branch execution.
+func (b *branchOptimizer) Start() {
+	b.start = time.Now()


Why don't we use the same last to track the start value? that way we don't need to share that variable:

b.last[branch] = time.Now()

and in the Finish method:

b.last[branch] = time.Since(b.last[branch])

We can rename last to be more accurate

Interesting idea. The variable types are different, time.Time vs time.Duration, will have to see if the overhead of getting unix nanoseconds is less than the var.

pkg/traceql/ast_execute.go

pkg/traceql/util.go

javiermolinar

LGTM!

…grafana#4163) * Dynamic reordering of binop branches * consts and cleanup * changelog

Dynamic reordering of binop branches

372e6cb

mdisibio requested review from joe-elliott, annanay25, mapno, yvrhdn, zalegrala, electron0zero, ie-pham and stoewer as code owners October 7, 2024 19:12

mdisibio commented Oct 7, 2024

View reviewed changes

pkg/traceql/util.go Show resolved Hide resolved

mdisibio commented Oct 7, 2024

View reviewed changes

pkg/traceql/ast_execute.go Outdated Show resolved Hide resolved

javiermolinar changed the title ~~TraceQL performance improvment: dynamic reordering of binop branches~~ TraceQL performance improvement: dynamic reordering of binop branches Oct 8, 2024

javiermolinar reviewed Oct 8, 2024

View reviewed changes

mdisibio added 2 commits October 8, 2024 07:55

consts and cleanup

faaee88

changelog

ba37ae3

joe-elliott reviewed Oct 8, 2024

View reviewed changes

pkg/traceql/util.go Show resolved Hide resolved

javiermolinar approved these changes Oct 11, 2024

View reviewed changes

mdisibio merged commit 3df4a7a into grafana:main Oct 11, 2024
16 checks passed

knylander-grafana pushed a commit to knylander-grafana/tempo-doc-work that referenced this pull request Oct 11, 2024

TraceQL performance improvement: dynamic reordering of binop branches (…

cf0db50

…grafana#4163) * Dynamic reordering of binop branches * consts and cleanup * changelog

joe-elliott mentioned this pull request Nov 12, 2024

[Problem] How can I improve tempo query performance #4239

Open

electron0zero mentioned this pull request Nov 28, 2024

max query expr electron0zero/tempo#3

Closed

joe-elliott mentioned this pull request Dec 11, 2024

Apply short circuit evaluation regarding generic attributes when querying #4422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TraceQL performance improvement: dynamic reordering of binop branches #4163

TraceQL performance improvement: dynamic reordering of binop branches #4163

mdisibio commented Oct 7, 2024 •

edited

Loading

mdisibio Oct 7, 2024

mdisibio Oct 7, 2024

javiermolinar left a comment

javiermolinar Oct 8, 2024

javiermolinar Oct 8, 2024

mdisibio Oct 8, 2024

javiermolinar left a comment

TraceQL performance improvement: dynamic reordering of binop branches #4163

TraceQL performance improvement: dynamic reordering of binop branches #4163

Conversation

mdisibio commented Oct 7, 2024 • edited Loading

mdisibio Oct 7, 2024

Choose a reason for hiding this comment

mdisibio Oct 7, 2024

Choose a reason for hiding this comment

javiermolinar left a comment

Choose a reason for hiding this comment

javiermolinar Oct 8, 2024

Choose a reason for hiding this comment

javiermolinar Oct 8, 2024

Choose a reason for hiding this comment

mdisibio Oct 8, 2024

Choose a reason for hiding this comment

javiermolinar left a comment

Choose a reason for hiding this comment

mdisibio commented Oct 7, 2024 •

edited

Loading