Update `StructFieldConst` to find missing offsets at runtime #1593

MrAlias · 2025-01-13T20:58:01Z

If the offset index does not contain an entry for the desired struct field, use the target process DWARF data to determine the offset.

Testing

End-to-end testing was done by updating the autosdk e2e test. All dependencies were bumped to commit hash versions:

0c18ff3#diff-14bffadf6f47d1ef427c47d4425fca90398e7b4c8d96f0e58f1a371833d837ebR6-R8

It was then tested both locally and on our CI system.

Locally the logs were verified to show this was working:

{"time":"2025-01-13T20:42:48.422927384Z","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":490},"msg":"Offset not cached, analyzing directly","key":"span_context_trace_id_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"traceID"}}
{"time":"2025-01-13T20:42:48.460714317Z","level":"DEBUG","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":508},"msg":"Offset found","key":"span_context_trace_id_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"traceID"},"offset":0}
{"time":"2025-01-13T20:42:48.46075608Z","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":490},"msg":"Offset not cached, analyzing directly","key":"span_context_span_id_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"spanID"}}
{"time":"2025-01-13T20:42:48.499132571Z","level":"DEBUG","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":508},"msg":"Offset found","key":"span_context_span_id_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"spanID"},"offset":16}
{"time":"2025-01-13T20:42:48.499175132Z","level":"INFO","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":490},"msg":"Offset not cached, analyzing directly","key":"span_context_trace_flags_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"traceFlags"}}
{"time":"2025-01-13T20:42:48.539235866Z","level":"DEBUG","source":{"function":"go.opentelemetry.io/auto/internal/pkg/instrumentation/probe.StructFieldConst.InjectOption","file":"/app/internal/pkg/instrumentation/probe/probe.go","line":508},"msg":"Offset found","key":"span_context_trace_flags_pos","id":{"ModPath":"go.opentelemetry.io/otel","PkgPath":"go.opentelemetry.io/otel/trace","Struct":"SpanContext","Field":"traceFlags"},"offset":24}

The update was then reverted given it will be managed by renovate.

Unify the opening of the target process exe file as a field.

If the offset index does not contain an entry for the desired struct field, use the target process DWARF data to determine the offset.

Used to temporarily test the functionality. Should be reverted before merge.

This reverts commit 0c18ff3.

RonFed

This is awesome

internal/pkg/instrumentation/probe/probe.go

internal/pkg/inject/consts.go

internal/pkg/instrumentation/probe/probe.go

MrAlias · 2025-01-14T19:53:26Z

DWARF data caching analysis

I wrote a benchmark to evaluate the overhead of reading the DWARF data:

package dwarfsize

import (
	"debug/dwarf"
	"debug/elf"
	"reflect"
	"testing"
	"unsafe"
)

func Benchmark(b *testing.B) {
	b.Run("otelcontribcol", benchmark("./bin/otelcontribcol_linux_amd64"))
	b.Run("autosdk", benchmark("./bin/autosdk"))
}

func benchmark(path string) func(*testing.B) {
	return func(b *testing.B) {
		f, err := elf.Open(path)
		if err != nil {
			b.Fatalf("failed to open ELF: %s", path)
		}
		defer f.Close()

		data, err := f.DWARF()
		if err != nil {
			b.Fatalf("failed to read DWARF symbols: %s", err)
		}

		if data == nil {
			b.Fatalf("empty DWARF data")
		}

		size := dwarfSize(*data)
		b.Logf("DWARF data size: %d bytes", size)

		b.ReportAllocs()
		b.ResetTimer()

		for n := 0; n < b.N; n++ {
			data, err = f.DWARF()
			if err != nil {
				b.Fatal(err)
			}
		}

		_ = data
	}
}

func dwarfSize(data dwarf.Data) uintptr {
	size := unsafe.Sizeof(data)

	v := reflect.ValueOf(data)
	for i := 0; i < v.NumField(); i++ {
		field := v.Field(i)
		switch field.Type().Kind() {
		case reflect.Array, reflect.Chan, reflect.Slice:
			size += uintptr(field.Cap()) * field.Type().Size()
		case reflect.Map, reflect.String:
			size += uintptr(field.Len()) * field.Type().Size()
		default:
		}
	}

	return size
}

I evaluated this against the autosdk e2e test binary and the OpenTelemetry collector contrib binary:

$ ls -lh bin
-rwxr-xr-x 1 tyler tyler 3.0M Jan 14 11:38 autosdk
-rwxr-xr-x 1 tyler tyler 411M Jan 14 11:15 otelcontribcol_linux_amd64

Results

goos: linux
goarch: amd64
pkg: testing/dwarfsize
cpu: Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
                  │   out.txt   │
                  │   sec/op    │
/otelcontribcol-8    2.241 ± 5%
/autosdk-8          21.55m ± 2%
geomean             219.8m

                  │   out.txt    │
                  │     B/op     │
/otelcontribcol-8   1.353Gi ± 0%
/autosdk-8          3.420Mi ± 0%
geomean             68.84Mi

                  │   out.txt   │
                  │  allocs/op  │
/otelcontribcol-8   315.7k ± 0%
/autosdk-8          3.115k ± 0%
geomean             31.36k

The logged dwarf data size:

otelcontribcol: DWARF data size: 5093228696 bytes
autosdk: DWARF data size: 39317984 bytes

Take-away

If my analysis is correct, it looks like this can allocate up to a few GB for binaries with large DWARF data and also take a few seconds to process that data.

I plan to remove the caching based on this analysis. It seems more reasonable to have the auto-instrumentation take a longer time to start in the failure-case that we do not know an offset or two than it is to have the resident memory of the auto-instrumentation be several GB for its lifetime.

We can always readdress this in the future if needed.

This reverts commit c5cb54b. Analysis has show that this caching could cause large allocations of memory to be held for the lifetime of the Instrumentation. This is removed to avoid that.

RonFed · 2025-01-14T21:43:09Z

@MrAlias Thanks for doing this benchmark, it's really interesting.
I agree that having the DWARF cached is not a good idea.
I wonder if we can do anything to reduce the memory allocation / latency for that DWARF processing.

MrAlias added 6 commits January 13, 2025 12:51

Add DWARF data processing to the process pkg

215ea99

Use process.DWARF in inspect app

02aea2a

Add TargetDetails.OpenExe

4f83dd4

Unify the opening of the target process exe file as a field.

Support offset inspection with the inject pkg

ab4d330

Update StructFieldConst to find missing offsets

7261793

If the offset index does not contain an entry for the desired struct field, use the target process DWARF data to determine the offset.

Update autosdk e2e test

0c18ff3

Used to temporarily test the functionality. Should be reverted before merge.

MrAlias force-pushed the uncached-offsets branch from 76c1628 to 0c18ff3 Compare January 13, 2025 21:03

MrAlias added 2 commits January 13, 2025 13:06

Add changelog entry

661e0dd

Revert "Update autosdk e2e test"

aab8a50

This reverts commit 0c18ff3.

MrAlias marked this pull request as ready for review January 13, 2025 21:21

MrAlias requested a review from a team as a code owner January 13, 2025 21:21

Merge branch 'main' into uncached-offsets

6350266

RonFed reviewed Jan 14, 2025

View reviewed changes

MrAlias and others added 4 commits January 14, 2025 08:30

Merge branch 'main' into uncached-offsets

e01021a

Use full StructField in err msg

7e335d0

Cached the DWARF data from the TargetDetails

c5cb54b

Doc TargetDetails.OpenExe

0469eab

Revert "Cached the DWARF data from the TargetDetails"

4bfdde5

This reverts commit c5cb54b. Analysis has show that this caching could cause large allocations of memory to be held for the lifetime of the Instrumentation. This is removed to avoid that.

RonFed approved these changes Jan 14, 2025

View reviewed changes

RonFed mentioned this pull request Jan 15, 2025

can't load instrumentation on golang.org/x/net@v0.33.0 #1615

Closed

MrAlias and others added 2 commits January 15, 2025 07:36

Merge branch 'main' into uncached-offsets

26490b7

Merge branch 'main' into uncached-offsets

f76707b

MrAlias merged commit 5d76e87 into open-telemetry:main Jan 15, 2025
27 checks passed

MrAlias deleted the uncached-offsets branch January 15, 2025 16:03

MrAlias added this to the v0.20.0-alpha milestone Jan 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update `StructFieldConst` to find missing offsets at runtime #1593

Update `StructFieldConst` to find missing offsets at runtime #1593

MrAlias commented Jan 13, 2025 •

edited

Loading

RonFed left a comment

MrAlias commented Jan 14, 2025

RonFed commented Jan 14, 2025

Update StructFieldConst to find missing offsets at runtime #1593

Update StructFieldConst to find missing offsets at runtime #1593

Conversation

MrAlias commented Jan 13, 2025 • edited Loading

Testing

RonFed left a comment

Choose a reason for hiding this comment

MrAlias commented Jan 14, 2025

DWARF data caching analysis

Results

Take-away

RonFed commented Jan 14, 2025

Update `StructFieldConst` to find missing offsets at runtime #1593

Update `StructFieldConst` to find missing offsets at runtime #1593

MrAlias commented Jan 13, 2025 •

edited

Loading