Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Message queues should drop when full #380

Closed
XAMPPRocky opened this issue Aug 27, 2021 · 8 comments
Closed

Message queues should drop when full #380

XAMPPRocky opened this issue Aug 27, 2021 · 8 comments
Labels
area/performance Anything to do with Quilkin being slow, or making it go faster. kind/cleanup Refactoring code, fixing up documentation, etc

Comments

@XAMPPRocky
Copy link
Collaborator

XAMPPRocky commented Aug 27, 2021

When going through the code with @luna-duclos yesterday, we noticed that currently we're using bounded queues that will block when full, when ideally we should be dropping some or all of the messages in the receiver when the limit is hit. This will probably require either creating a new queue type or finding a library with this behaviour.

@XAMPPRocky XAMPPRocky added area/performance Anything to do with Quilkin being slow, or making it go faster. kind/cleanup Refactoring code, fixing up documentation, etc labels Aug 27, 2021
@iffyio
Copy link
Collaborator

iffyio commented Aug 27, 2021

Oh that's interesting I didn't think we had this issue, are there any examples where this isn't working properly for us? The idea with using bounded queues was to indeed block when the queue is full and back-pressure automatically kicks in - i.e we do not read new packets from the socket (or ask xds server for more updates)

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Aug 27, 2021

back-pressure automatically kicks in - i.e we do not read new packets from the socket (or ask xds server for more updates)

That's correct, but I think that's wrong kind of backpressure we want, because UDP is a stateless protocol, and in games what you typically want is latest state of the game always, older packets are functionally useless, so keeping them in queue isn't useful for us. What I would expect Quilkin to do when the queue is full and it receives a new message, is that it clears the queue and inserts new message as the sole item in the queue. This would mean for example when Quilkin is being overwhelmed, new traffic isn't being stopped by exhausted queues, though it may still be dropped.

@iffyio
Copy link
Collaborator

iffyio commented Aug 27, 2021

I don't think quilkin is the where we want to solve this problem. Thinking if the proxy is overloaded, that in itself is likely a problem external to quilkin and I think the priority is to protect the proxy as much as possible - i.e no need to be clever about what packets mean (we don't want to throw packets belonging to sessions that are in fact on time) and what should be prioritized because that only means more work to do which isn't good when you're already overloaded.
Because coming to think of it, why would quilkin be overwhelmed in a production scenario where the filterchain is primed to get packets off as quickly as possible (i.e it isn't doing silly stuff like IO per packet)? Its mostly likely infra issues like the proxy is being DDOSed or it just can't keep up with the traffic - trying to prioritize/shuffle packets in memory for the former is a pointless task and only worsens the situation while the latter is a capacity issue that can only be solved external to the proxy.

We already have plans for clients to detect when they're losing or delaying packets and switch proxies so at least from a game client's pov they're not left helpless which is less incentive for quilkin to worry about it

@markmandel
Copy link
Contributor

Very late to the party, but have had this ticket rolling around my head, and have been meaning to write something for a while.

I'd love to come at this from a problem first type approach:

I.e.

  1. What are the different scenarios in which a proxy could be overwhelmed? (ddos? bad config? can we tell the difference?)
  2. When it does get overwhelmed, what are the solutions that could solve this? (rate limiting, dropping packets, monitoring, rules and actions? etc)
  3. Do each of the solutions work in all scenarios? Do they work for all users? Should they be configurable? If so, how?

Then we can weigh the tradeoffs of each solution comparatively, and start making some design decisions.

(Sounds like a good topic for the next community meeting 👍🏻)

@markmandel
Copy link
Contributor

Action item from community meeting: Let's test overloading the proxy and record what exactly happens 👍🏻

@XAMPPRocky
Copy link
Collaborator Author

XAMPPRocky commented Sep 29, 2021

To add to that, as an initial goal, I think we only initially care about what happens in the docker container version of Quilkin, since that's what we're deploying right now. The behaviour when running on a Mac or Windows or even desktop Linux is a bit less of a concern than the containerised version IMO.

@markmandel
Copy link
Contributor

I think this can be closed now that #543 has been implemented - the queues are now pretty much all gone.

Any objections?

@XAMPPRocky
Copy link
Collaborator Author

Nope, sounds good to me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/performance Anything to do with Quilkin being slow, or making it go faster. kind/cleanup Refactoring code, fixing up documentation, etc
Projects
None yet
Development

No branches or pull requests

3 participants