Cache benchmarks #405

kentquirk · 2023-11-28T15:55:18Z

So I wanted to compare some ideas for improving this, and ended up spending more than an hour writing benchmarks and trying ideas.

All that is in here, but the summary is that we get a lot more predictable performance by using sync.Map instead of Go's native map. The alternative concurrent-map doesn't actually do better in this case.

The function getPrefixedFieldNameSync in this PR is the best-performing choice, IMO.

Below are the results of the benchmark. orig is the original code (prepending app unconditionally), new is Mike's implementation, sync is using sync.Map, conc is using concurrent-map.

f50 means that there are 50 different field names, g10 means there are 10 concurrent goroutines accessing the cache. So f2000-g300 is the most stressful.

I also tried the new algorithm but replaced a random value in the map instead of recreating it when it was full. This was not a winning strategy, so I didn't include it in the benchmark (but the code is there).

My recommendation is that we replace the current design in the PR with the contents of getPrefixedFieldNameSync and call it a day.

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -bench ^BenchmarkGetPrefixedFieldNameParallel$ github.com/honeycombio/beeline-go -count=1

goos: darwin
goarch: arm64
pkg: github.com/honeycombio/beeline-go
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g1-10         	32143990	        36.77 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g1-10        	32188372	        36.89 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g1-10       	27603550	        38.83 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g1-10          	30958353	        42.13 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g1-10         	28620691	        42.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g1-10        	 9467629	       126.7 ns/op	     133 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g1-10         	25753398	        49.62 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g1-10        	23410959	        51.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g1-10       	19778806	        58.74 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g1-10         	29850870	        38.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g1-10        	19890284	        60.95 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g1-10       	18314028	        66.09 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g50-10        	 7052200	       167.0 ns/op	      21 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g50-10       	 7049428	       168.0 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g50-10      	 7073343	       170.0 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g50-10         	 7069659	       172.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g50-10        	 6797158	       175.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g50-10       	 4996476	       242.6 ns/op	     132 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g50-10        	 7104078	       162.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g50-10       	 7332326	       185.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g50-10      	 6079827	       192.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g50-10        	 6289045	       193.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g50-10       	 5752580	       209.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g50-10      	 5574608	       214.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g300-10       	 7138944	       164.4 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g300-10      	 7169821	       168.3 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g300-10     	 6930981	       171.2 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g300-10        	 6868586	       168.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g300-10       	 6978414	       181.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g300-10      	 4246818	       284.8 ns/op	     130 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g300-10       	 7042858	       169.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g300-10      	 6347726	       188.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g300-10     	 5882048	       200.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g300-10       	 6072379	       189.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g300-10      	 5719530	       214.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g300-10     	 5473842	       218.5 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/honeycombio/beeline-go	49.900s

robbkidd · 2023-11-28T16:19:14Z

I also tried the new algorithm but replaced a random value in the map instead of recreating it when it was full. This was not a winning strategy, so I didn't include it in the benchmark (but the code is there).

With new + EjectRandom disqualified, do you think it is worth trying the recommended sync algorithm + EjectRandom to constrain map size?

kentquirk · 2023-11-28T16:25:37Z

I was going to say no, because sync.Map doesn't even have a Size() or Len() function, so we'd have to implement that separately. And then I just remembered that this cache is global to the process and thus it's possible to grow very large. I'm going to implement that and try it. Thanks for the poke.

kentquirk · 2023-11-28T19:13:35Z

Update -- I spent a while trying to limit the size of the caches in various ways, and all of them ended up causing significant performance bottlenecks when we do so safely (behind locks).

I no longer think that we should do this at all. More in the main issue

Experiment with alternatives

7e62a2b

kentquirk requested a review from a team November 28, 2023 15:55

kentquirk marked this pull request as draft November 28, 2023 15:55

kentquirk mentioned this pull request Nov 28, 2023

feat: Improve speed and reduce allocations when adding fields #402

Closed

kentquirk closed this Nov 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache benchmarks #405

Cache benchmarks #405

kentquirk commented Nov 28, 2023 •

edited

Loading

robbkidd commented Nov 28, 2023

kentquirk commented Nov 28, 2023

kentquirk commented Nov 28, 2023

Cache benchmarks #405

Cache benchmarks #405

Conversation

kentquirk commented Nov 28, 2023 • edited Loading

robbkidd commented Nov 28, 2023

kentquirk commented Nov 28, 2023

kentquirk commented Nov 28, 2023

kentquirk commented Nov 28, 2023 •

edited

Loading