Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cache benchmarks #405

Closed
wants to merge 1 commit into from
Closed

Cache benchmarks #405

wants to merge 1 commit into from

Conversation

kentquirk
Copy link
Contributor

@kentquirk kentquirk commented Nov 28, 2023

So I wanted to compare some ideas for improving this, and ended up spending more than an hour writing benchmarks and trying ideas.

All that is in here, but the summary is that we get a lot more predictable performance by using sync.Map instead of Go's native map. The alternative concurrent-map doesn't actually do better in this case.

The function getPrefixedFieldNameSync in this PR is the best-performing choice, IMO.

Below are the results of the benchmark. orig is the original code (prepending app unconditionally), new is Mike's implementation, sync is using sync.Map, conc is using concurrent-map.

f50 means that there are 50 different field names, g10 means there are 10 concurrent goroutines accessing the cache. So f2000-g300 is the most stressful.

I also tried the new algorithm but replaced a random value in the map instead of recreating it when it was full. This was not a winning strategy, so I didn't include it in the benchmark (but the code is there).

My recommendation is that we replace the current design in the PR with the contents of getPrefixedFieldNameSync and call it a day.

Running tool: /opt/homebrew/bin/go test -benchmem -run=^$ -bench ^BenchmarkGetPrefixedFieldNameParallel$ github.com/honeycombio/beeline-go -count=1

goos: darwin
goarch: arm64
pkg: github.com/honeycombio/beeline-go
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g1-10         	32143990	        36.77 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g1-10        	32188372	        36.89 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g1-10       	27603550	        38.83 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g1-10          	30958353	        42.13 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g1-10         	28620691	        42.92 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g1-10        	 9467629	       126.7 ns/op	     133 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g1-10         	25753398	        49.62 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g1-10        	23410959	        51.71 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g1-10       	19778806	        58.74 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g1-10         	29850870	        38.17 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g1-10        	19890284	        60.95 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g1-10       	18314028	        66.09 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g50-10        	 7052200	       167.0 ns/op	      21 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g50-10       	 7049428	       168.0 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g50-10      	 7073343	       170.0 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g50-10         	 7069659	       172.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g50-10        	 6797158	       175.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g50-10       	 4996476	       242.6 ns/op	     132 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g50-10        	 7104078	       162.8 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g50-10       	 7332326	       185.1 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g50-10      	 6079827	       192.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g50-10        	 6289045	       193.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g50-10       	 5752580	       209.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g50-10      	 5574608	       214.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f50-g300-10       	 7138944	       164.4 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f500-g300-10      	 7169821	       168.3 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/orig-f2000-g300-10     	 6930981	       171.2 ns/op	      22 B/op	       1 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f50-g300-10        	 6868586	       168.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f500-g300-10       	 6978414	       181.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/new-f2000-g300-10      	 4246818	       284.8 ns/op	     130 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f50-g300-10       	 7042858	       169.2 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f500-g300-10      	 6347726	       188.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/sync-f2000-g300-10     	 5882048	       200.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f50-g300-10       	 6072379	       189.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f500-g300-10      	 5719530	       214.3 ns/op	       0 B/op	       0 allocs/op
BenchmarkGetPrefixedFieldNameParallel/conc-f2000-g300-10     	 5473842	       218.5 ns/op	       0 B/op	       0 allocs/op
PASS
ok  	github.com/honeycombio/beeline-go	49.900s

@robbkidd
Copy link
Member

I also tried the new algorithm but replaced a random value in the map instead of recreating it when it was full. This was not a winning strategy, so I didn't include it in the benchmark (but the code is there).

With new + EjectRandom disqualified, do you think it is worth trying the recommended sync algorithm + EjectRandom to constrain map size?

@kentquirk
Copy link
Contributor Author

I was going to say no, because sync.Map doesn't even have a Size() or Len() function, so we'd have to implement that separately. And then I just remembered that this cache is global to the process and thus it's possible to grow very large. I'm going to implement that and try it. Thanks for the poke.

@kentquirk
Copy link
Contributor Author

Update -- I spent a while trying to limit the size of the caches in various ways, and all of them ended up causing significant performance bottlenecks when we do so safely (behind locks).

I no longer think that we should do this at all. More in the main issue

@kentquirk kentquirk closed this Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants