-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add naive benchmark #321
Add naive benchmark #321
Conversation
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
9821be5
to
dc04a97
Compare
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
Build Succeeded 🥳 Build Id: 26baa30b-8b9d-4017-9907-d680e254ca14 To build this version:
|
Allocating the 64K buffer on the stack makes the future expensive to move. This allocates the buffer on the heap instead. Noticed some significant perf improvements in load tests via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets Tried using the naive benchmark from #321 but that doesn't seem consisten atm - constantly got perf regressions/improvements on reruns even with no code change (I think either running both the proxy and the mock server within the same process/scheduler adds too much noise or we don't have a large enough unit of work) Relates to #330
* Move packet buffer to heap Allocating the 64K buffer on the stack makes the future expensive to move. This allocates the buffer on the heap instead. Noticed some significant perf improvements in load tests via https://github.com/majek/dump/tree/master/how-to-receive-a-million-packets Tried using the naive benchmark from #321 but that doesn't seem consisten atm - constantly got perf regressions/improvements on reruns even with no code change (I think either running both the proxy and the mock server within the same process/scheduler adds too much noise or we don't have a large enough unit of work) Relates to #330 * Initialize buffer Co-authored-by: Mark Mandel <markmandel@google.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Really like the html reports for comparison, just had a small addition I'd like to see for those not as familiar with Rust tooling.
Side thought: Do we want to set this up to run nightly and dump the reports into a cloud storage bucket somewhere?
Approving so you can merge when ready 👍🏻
Well, if we do we shouldn't try to use it for measuring performance regressions. Criterion specifically warns against running on CI environments, because very sensitive to noisy neighbouring. I don't know how much of an issue that would be with GCP. We could still publish them just to look at, we could add it as part of the deploy-docs job. |
Ah interesting. We're currently running on E2 series VMs, which are not VMs that are specifically built for isolation. I'd be surprised if it was as noisy as a CI platform, but it's not totally isolated. Something to think about at least, or maybe experiment with, see what results we get. |
Co-authored-by: Mark Mandel <markmandel@google.com>
Build Succeeded 🥳 Build Id: 6dcdd5ac-0a19-4e87-912b-f242cde1ce70 To build this version:
|
Build Succeeded 🥳 Build Id: 99868598-61d7-4c71-8021-d75ee1743f79 To build this version:
|
This adds a naive criterion benchmark suite that runs two separate benchmarks to create a comparison. In both cases we're simply sending packets back and forth, but the first case is the UDP socket talking directly to another socket, and the second case is with Quilkin as a middle-man between the two. It should be strongly noted that this benchmark heavily favours the direct case, as there is zero latency or drop, it mostly is blocked by syscalls, so this is not a fair comparison of Quilkin's performance in the real world, however what it does is provide a baseline for the overhead of processing a packet in Quilkin.
I've provided what it looks like on 2019 MBP and currently Quilkin is about 2/3's slower than reading it directly. (Unit is in μ/secs)