Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BenchmarkDotNet snippet to compare Span<byte> to raw pointer #370

Merged
merged 3 commits into from
May 9, 2024

Conversation

badrishc
Copy link
Contributor

@badrishc badrishc commented May 8, 2024

No description provided.

@badrishc badrishc merged commit ed4b9ab into main May 9, 2024
23 checks passed
@badrishc badrishc deleted the badrishc/bdn-span-ptr branch May 9, 2024 00:12
@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 9, 2024

There's a lot more nuance in play than directly comparing them like this 😄

But yes, pointers will generally win in Garnet as there is pinned network buffer on heap which is being reused.

@badrishc
Copy link
Contributor Author

badrishc commented May 9, 2024

This was just to check the claim/hope that Span<byte> could actually even outperform pointers. It seems they are close, but the gap still exists. What other nuances do you have in mind for server code such as Garnet, where we can ensure that we are being careful with pointers?

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 9, 2024

If you already have a pointer, it's of course generally better to stay in the pointer world. And as the team seems clearly comfortable in using them and if they feel intuitive, absolutely it's the choice for Garnet. Especially in the APIs that are not directly exposed to the library consumers.

The strength of Span<T> is really in its flexibility to represent any contiguous memory "slice". This means where library author previously needed to expose T* ptr, int length for their high-perf consumers and then T[] array, int start, int length in addition to the just T[] array. All of these are now merged in one safe .NET construct (Long live Midori). This means that often you can just stay in the (RO)Span<T> world and delegate the work to the BCL's already vectorized workhorse methods. What this means is that when we need a pointer, we can make the pinning short lived as possible which "costs" only a stack spill (couple movs. this can and will be improved: dotnet/runtime#63397) and if no GC occurs during this pin, it's essentially free.

In Garnet's case, one has to consider the trade-offs of maintaining all the manual parsing logic where we think we can beat BCL. If something we do is measurably faster, why not contribute it upstream for all to benefit (yes, we can always make more assumptions than BCL, so we can be faster) and how to allocate/focus the development efforts.

Benchmarking is hard and one can unconsciously write micro-benchmark to give the results they want to see (been there, done that). BenchmarkDotNet tries it best but can't help against the way processors are too smart nowadays. The micro-benchmarked method(s) are going to stay in instruction cache and make for example code that heavily relies on data being L1/L2 cache look, where they would incur a lot more cache-misses in real workloads. And nuances with stupidly smart branch predictors.. etc.

I really want to have great benchmarking tooling in Garnet repo to quickly iterate and see real effects of performance changes against different workloads. Something along the lines of ASP.NET's Crank but configurable RESP workloads to see tangible RPS/Memory/Latency/CPU metrics. These don't need to be maximum theoretical special lab-hardware measurement, but something to give indication how much RPS/Latency we lose if, for example, we just used Utf8Parser & Utf8Formatter APIs instead of handrolling our own.

Sorry for the wall of text rambling😅

@badrishc
Copy link
Contributor Author

badrishc commented May 9, 2024

We are in vehement agreement here. In fact, the above BDN seems to suggest that even for this rather contrived case, Span<byte> is very close (if not sometimes equal) to byte* - as long as the user (Garnet code in our case) does not need to re-fixed the Span<byte> just to get a pointer because they need a pointer for some reason.

@github-actions github-actions bot locked and limited conversation to collaborators Jul 9, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants