-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace listen distributor task with multithreaded SO_REUSEPORT
task.
#410
Comments
This sounds worth exploring indeed! I guess the idea would be to give each worker its own socket with this enabled? |
Yeah, the exact code they used is here. I included inline the socket stuff for convenience. One thing to figure out would be if there's an equivalent option for Windows. MacOS does have it, though it's behaviour is slightly different, in that Linux behaves specially, it seems mostly in relation its TCP implementation though, which isn't relevant for us. I've included some good sources on it.
let sock = socket2::Socket::new(
match addr {
SocketAddr::V4(_) => socket2::Domain::IPV4,
SocketAddr::V6(_) => socket2::Domain::IPV6,
},
socket2::Type::STREAM,
None,
)
.unwrap();
sock.set_reuse_address(true).unwrap();
sock.set_reuse_port(true).unwrap();
sock.set_nonblocking(true).unwrap();
sock.bind(&addr.into()).unwrap();
sock.listen(8192).unwrap();
let incoming =
tokio_stream::wrappers::TcpListenerStream::new(TcpListener::from_std(sock.into()).unwrap()); |
Very cool! Excited to see the results!
My thought would be: Given that most of our high-load workloads (I expect) will happen on Linux, as long as the system works for a single connection (client side) in a reasonably performant way, on Win and Mac I expect that will be fine. |
Yeah, my concern with Windows is more if there isn't a good equivalent, we have to maintain a workaround just for windows which could be awkward, if it's not perfectly as performant that's not as much an issue. |
Was just reading about this some more, noted that for Tokio's UDP Socket there exists:
Mostly just writing this here in case I come back around looking for it again. |
I think socket2 handles this for us, to a degree: https://docs.rs/socket2/latest/socket2/struct.Socket.html
And it looks like Windows supports But we might require some different settings for each OS, which we should be able to conditionally check and respond to: But this definitely looks very doable, even with the current architecture. |
Yeah, when I built a small proof of just using the options, it meant that essentially a single worker was being used for everything while the other works sat idle. If the main worker failed the one of the other sockets would start receiving the traffic, so not the worst behaviour |
That's unfortunate 😞 from reading, I had thought that SO_REUSEADDR on windows would work the same as SO_REUSEPORT -- but I was never quite sure from my readings. I also found https://stackoverflow.com/questions/14388706/how-do-so-reuseaddr-and-so-reuseport-differ?rq=1 quite interesting for differences across platforms. I ended up going down the rabbit hole, it's super interesting stuff. |
So I want to take a stab at this, primarily because in my tests, I'm seeing some performance differences of For example: You can see it the difference in 99% in The first thought I had -- was to look at our existing benchmarks, and see if we could capture not just throughput entirely, but split the data also our by Then I can step into attempting to fit this into our current architecture (which I actually don't think will be too hard - but famous last works 😄 ). Sound good? |
SGTM |
Wanted to be able to highlight if we had bottlenecks in performance on read vs write operations on the proxy. This adds an extra benchmark to throughput.rs called "readwrite" and follows a similar pattern as the overall throughput benchmark, with both direct and proxies traffic utilised as extra comparison values. Work on googleforgames#410
Wanted to be able to highlight if we had bottlenecks in performance on read vs write operations on the proxy. This adds an extra benchmark to throughput.rs called "readwrite" and follows a similar pattern as the overall throughput benchmark, with both direct and proxies traffic utilised as extra comparison values. Work on #410
Started work on the implementation for local packet reception. Will provide some benchmarks when I've got something working. |
Making progress! Seem to have the basics working, but now running into some kind of race conditions in the unit tests that were not happening before around packet reception and packet sending. Looking into it. https://github.com/markmandel/quilkin/tree/wip/reuse-port is anyone wants to take a peek. |
Got it working nicely on my end, will start pulling out PRs and submitting. Code is way cleaner, and we can remove a bunch of channel and worker code along the way (oh and I need to do the windows build!) With the single-client benchmarks, we see a few us shaved off, but I would expect better results with multiple clients. e.g. (throughput benchmark) Current: With SO_REUSEPORT we see: On readwrite, similarly: Before, With SO_REUSEPORT |
Implemented the use of SO_REUSEPORT for *nix systems and SO_REUSEADDR for Windows systems. This removes a lot of the code needed for channel coordination that was previously in place, and simplifies much of the architecture, as well as improving performance. Closes googleforgames#410
Implemented the use of SO_REUSEPORT for *nix systems and SO_REUSEADDR for Windows systems. This removes a lot of the code needed for channel coordination that was previously in place, and simplifies much of the architecture, as well as improving performance. Closes #410 Co-authored-by: XAMPPRocky <4464295+XAMPPRocky@users.noreply.github.com>
Currently all of our traffic goes through a distributor task which distributes all messages in the UDP buffer amongst all workers. Under heavy workloads this is likely to be one of the main bottlenecks in the program. While reading this blog post it introduced me to the
SO_REUSEPORT
, which is designed specifically to address this bottleneck in network applications.Using
SO_REUSEPORT
andSO_REUSEADDRESS
we can eliminate the distributor task entirely, and have each worker entirely responsible for their socket, this has the potential to have serious performance improvements as seen in the blog post where the reused port server continues to scale linearly past 300,000 as the number clients increased, while the listen distributor server struggles to reach that.The text was updated successfully, but these errors were encountered: