-
Notifications
You must be signed in to change notification settings - Fork 298
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
snabbmark: Add "mp-ring" multiprocess benchmark #804
Conversation
This benchmark measures the throughput of <N> Snabb processes that are circularly connected together in a ring.
Process <N> uses core <N>.
The upstream branch for this change is The whole |
I'm not sure if it's kosher to keep commenting here, but this little snippet of the benchmark also causes cache line ping-ponging. If you add a crude C.usleep(100) in there,
I think we could make the main process wait on a semaphore rather than spinning like this. |
Interesting! Please push that crude sleep somewhere e.g. your I wonder if |
I think it's because there is no open Pull Request from your branch at the moment. Github automatically closed #813 when I completed the merge. Should be able to start a new pull request from the same branch to send further changes. |
…mx-test Temporarily disable snabbvmx selftest
Add a new
mp-ring
benchmark for measuring the performance of basic multi-process link operations. The benchmark forks worker processes and cycles packets through them via a series of links.This particular benchmark "just works" in multiprocess mode without any changes to the Snabb core because packets and links are already allocated in shared memory, that is done before the children are forked, and freelists are not used because the same packets keep circulating from link to link.
The intention of this benchmark is to be a framework for investigating any fundamental performance limits of inter-process traffic (see #801) and for reproducing specific issues like inter-core conflicts on the cache-coherence level. The code uses a simple and naive Lua implementation ("basic") but can accommodate more sophisticated implementations (in the spirit of the asm code in #603). This could make it a useful tool for prototyping code like 100G mux/demux (#691).
(I have made no attempt to optimize this benchmark. That is another activity entirely.)
cc @xrme @kbara
Examples
Usage