Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cmd/compile: inlining confuses escape analysis with interface{} params #53465

Open
mvdan opened this issue Jun 20, 2022 · 12 comments
Open

cmd/compile: inlining confuses escape analysis with interface{} params #53465

mvdan opened this issue Jun 20, 2022 · 12 comments
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Milestone

Comments

@mvdan
Copy link
Member

mvdan commented Jun 20, 2022

Take a look at https://go.dev/play/p/mPqA0RaybuQ.

On go version devel go1.19-9068c6844d Thu Jun 16 21:58:40 2022 +0000 linux/amd64 I get the following benchmark results:

name                        time/op
ConditionalDebugf/false-16  69.4ns ± 3%
ConditionalDebugf/true-16    280ns ± 0%
InlinedDebugf/false-16      0.53ns ± 1%
InlinedDebugf/true-16        281ns ± 4%

name                        alloc/op
ConditionalDebugf/false-16   24.0B ± 0%
ConditionalDebugf/true-16    40.0B ± 0%
InlinedDebugf/false-16       0.00B     
InlinedDebugf/true-16        40.0B ± 0%

name                        allocs/op
ConditionalDebugf/false-16    2.00 ± 0%
ConditionalDebugf/true-16     3.00 ± 0%
InlinedDebugf/false-16        0.00     
InlinedDebugf/true-16         3.00 ± 0%

First, note that both debugf functions are inlined:

$ go test -gcflags='-m' |& grep Debugf
./go_test.go:12:6: can inline conditionalDebugf
./go_test.go:18:6: can inline inlinedDebugf
./go_test.go:30:22: inlining call to conditionalDebugf
./go_test.go:45:19: inlining call to inlinedDebugf

And, in the true sub-benchmarks where debug logging is enabled and we actually call Sprintf, we can see three allocations - two corresponding to the two interface{} parameters, as the string and int variables escape to the heap via the interface, and one for the actual call to Sprintf constructing a string.

So far so good. However, when debug logging is disabled, I want to avoid those allocations, because this debug logging happens in a tight loop that runs millions of times in a program, so I want to avoid millions of allocations made in a very short amount of time.

This works fine for the "manually inlined" case where I wrap the call to Debugf with a conditional. Note the zero allocations on InlinedDebugf/false. However, that means that every time I want to use Debugf (a dozen times in said program), I need three lines rather than one, and it's pretty repetitive and verbose.

One good alternative is to move the conditional inside the Debugf function - and since it's still inlined, it should be the same result. However, ConditionalDebugf/false allocates twice, presumably because the two arguments do still allocate, as if I was actually calling Sprintf when in fact I am not.

It seems to me like the two benchmarks should behave the same; the compiler's escape analysis and inliner should work well enough together to detect this scenario.

Note that my real use case involves log.Printf, which is a no-op when one has called log.SetOutput(io.Discard). The benchmark above is a simplification which still reproduces the same problem.

@randall77
Copy link
Contributor

This will be a hard one to fix. The conversion to interface happens when building the [2]interface{} to back the ...interface{} slice. That happens before the call to conditionalDebugf. Inlining doesn't really make a difference, it still gets built before the inlined body of conditionalDebugf executes. We'd have to realize that the thing being built is only used inside a conditional, and move that building down inside the conditional.

Similar to the code below, but for a [2]interface{} instead of just an interface{}.

var c bool
var sink interface{}

func f(x string) {
	var y interface{} = x
	if c {
		sink = y
	}
}

The compiler has a tighten pass which does this kind of movement, but it only works for SSA-able values, which these aren't.

@josharian
Copy link
Contributor

Could this be fixed as a special case? Recognize exactly these cases:

func f {
  if safeexpr {
    // …
  } 
}
func f {
  if !safeexpr {
    return
  } 
  // …
}

These are very common wrappers for a very common problem with significant readability impact; it might be worth the code weight in the compiler.

@mvdan
Copy link
Member Author

mvdan commented Jun 20, 2022

Intuitively, I'm hinting at @josharian's suggestion: as a programmer I'm inclined to move the conditional from the function body to each call site, because there are clearly some gains to be had. But that sort of manual partial inlining is what the compiler should be able to do, not me :)

We'd have to realize that the thing being built is only used inside a conditional, and move that building down inside the conditional.
The compiler has a tighten pass which does this kind of movement, but it only works for SSA-able values, which these aren't.

That does sound like a more generic solution, but I'm not in a position to judge how difficult it is compared to Josh's suggestion.

@beoran
Copy link

beoran commented Jun 23, 2022

Fyi, Zerolog fixed this by not having interface{} parameters in the first place. It does mean the api has to support all types with separate methods. https://github.com/rs/zerolog

@cagedmantis cagedmantis added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Jun 23, 2022
@cagedmantis cagedmantis added this to the Backlog milestone Jun 23, 2022
@mvdan
Copy link
Member Author

mvdan commented Jul 2, 2022

@beoran that is indeed a possible solution, but a drastically different one that I'm not considering at the moment. It is more verbose too, so it doesn't exactly win against if debug { log.Printf(...) either :) It is likely better if what one wants is logging at maximum efficiency, but that is not what I'm after - I just want near-zero overhead when logging is disabled.

Regardless, I think it would be best for the Go compiler to gain this optimization, because ...any parameters will remain commonplace due to their usefulness.

@mvdan
Copy link
Member Author

mvdan commented Jul 2, 2022

Here's my inserting of the if debug { ... } conditionals in the downstream project, for a 5% reduction of total allocations: burrowers/garble#563

@gopherbot gopherbot added the compiler/runtime Issues related to the Go compiler and/or runtime. label Jul 13, 2022
@mknyszek mknyszek moved this to Triage Backlog in Go Compiler / Runtime Jul 15, 2022
@thockin
Copy link

thockin commented Nov 21, 2022

I think this is the same as I see in github.com/go-log/logr (xref #56873). I am seeing people adopt unnatural caller-side checking which feels like what mvdan said (#53465 (comment))

package main

import (
	"fmt"
	"testing"

	"github.com/go-logr/logr"
)

//go:noinline
func doV9(b *testing.B, log logr.Logger) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		log.V(9).Info("multi",
			"bool", true, "string", "str", "int", 42,
			"float", 3.14, "struct", struct{ X, Y int }{93, 76})
	}
}

//go:noinline
func doManualV9(b *testing.B, log logr.Logger) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		// This manual optimization should not be needed?
		if logV := log.V(9); logV.Enabled() {
			logV.Info("multi",
				"bool", true, "string", "str", "int", 42,
				"float", 3.14, "struct", struct{ X, Y int }{93, 76})
		}
	}
}

// This is a simple repro all in one package.
type Repro struct {
	enabled bool
}

//go:noinline
func (r Repro) Info(msg string, keysAndValues ...interface{}) {
	if r.enabled {
		r.Impl(msg, keysAndValues...)
	}
}

//go:noinline
func (r Repro) Impl(msg string, keysAndValues ...interface{}) {
	panic(fmt.Sprintln("this should not get called", msg, keysAndValues))
}

//go:noinline
func doLocalRepro(b *testing.B, repro Repro) {
	b.ReportAllocs()
	for i := 0; i < b.N; i++ {
		repro.Info("multi",
			"bool", true, "string", "str", "int", 42,
			"float", 3.14, "struct", struct{ X, Y int }{93, 76})
	}
}

func BenchmarkLocalReproDisabled(b *testing.B) {
	r := Repro{false}
	doLocalRepro(b, r)
}

func BenchmarkDiscardV9(b *testing.B) {
	var log logr.Logger = logr.Discard()
	doV9(b, log)
}

func BenchmarkDiscardManualV9(b *testing.B) {
	var log logr.Logger = logr.Discard()
	doManualV9(b, log)
}

@pohly
Copy link

pohly commented Nov 21, 2022

Should the basic problem get addressed, would it also eliminate function calls that are needed to construct the parameter slice?

In Kubernetes, we often have:

logger.V(5).Info("created", "object", klog.KObj(pod))

where KObj is https://github.com/kubernetes/klog/blob/9ae4992afb54bb4686590fff003828c30c84c92c/k8s_references.go#L56. It shouldn't have any side effects, but I guess the compiler would have to be able to figure that out, otherwise it cannot skip the call?

@mvdan
Copy link
Member Author

mvdan commented Jul 15, 2023

@pohly this issue is just for the escape analysis. That is, if the logger is disabled, the log arguments should not be escaping. A few allocations per log call add up.

For the work involved in preparing the logged values, see #37739. For pure functions maybe the compiler could be smart enough to do this for you, but I'd follow that issue regardless.

@mvdan
Copy link
Member Author

mvdan commented Nov 20, 2024

Perhaps interesting to @dr2chase given his recent work on better inlining for iterator funcs, e.g. #69411. I still think something like #53465 (comment) would be worthwhile.

@thepudds
Copy link
Contributor

FWIW, it seems to me this might be solved by #8618 (comment) and #62653. (However, I find escape analysis to be subtle, so I probably should have thought longer about this example before speaking here 😅).

Using one the CLs from that stack, the ConditionalDebugf/false case goes to 0 allocations (which if I understood seems to be @mvdan's main goal here).

Also, the ConditionalDebugf/true case with that same CL goes from 3 allocations to 1 allocation. (The heap allocation of the Sprintf string result remains, which might be required/expected given the Sprintf result is assigned to a global by @mvdan's benchmark).

$ go install golang.org/dl/gotip@latest
$ gotip download 528538

                          │    go1.23    │              cl-528538                  │
                          │  allocs/op   │ allocs/op   vs base                     │
ConditionalDebugf/false-4   2.000 ± 0%     0.000 ± 0%  -100.00% (p=0.000 n=10)
ConditionalDebugf/true-4    3.000 ± 0%     1.000 ± 0%   -66.67% (p=0.000 n=10)
InlinedDebugf/false-4       0.000 ± 0%     0.000 ± 0%         ~ (p=1.000 n=10) 
InlinedDebugf/true-4        3.000 ± 0%     1.000 ± 0%   -66.67% (p=0.000 n=10)

(That particular CL 528538 is abandoned mainly because it relied on me having created a new pragma to solve a reflect.Value.Interface allocation in an earlier CL, but I later concluded a more general approach was likely possible, which is now spread across a few other CLs in that stack. #8618 (comment) has a brief overview of that).

All that said, changes along the lines suggested by Josh above or Keith above could also be useful in their own right, or maybe there is a good inining change to make here...

@mvdan
Copy link
Member Author

mvdan commented Nov 20, 2024

Interesting, thank you. When your changes land in master I'll try again with my real project to see whether the allocations there are still present. The example in the original post here is a simplification, so the funcs involved are a lot smaller.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler/runtime Issues related to the Go compiler and/or runtime. NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. Performance
Projects
Status: Triage Backlog
Development

No branches or pull requests

9 participants