-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rules and Actions based on Filter events #63
Comments
Per #158 (comment) I'm going to write up some motivations on why IP filtering is one of the reasons we should have events. I'll start with just mentioning that purely static blocklists aren't really something we focus on, because Quilkin is not the right tool for that problem. We might support it, but really if you know what IPs you want to ban, you should set them using Really the main the problem that needs design is dynamic bans, and I think solving dynamic IP filtering through filters is fundamentally flawed, because filters by design don't have a tonne of control or insight into Quilkin's pipeline while it's running. Take the following example of a chain of filters. Each of these filters have a case where they want to start dropping both this packet and future packets from a sender either indefinitely or some fixed period of time.
The problem with using filters fundamentally is that despite dropping the packets, the attacker can still saturate our entire downstream up to the code branch that decides to drop the packet. So even though we succeed to foiling the attacker by preventing the packet from reaching the servers, the attacker's still got Quilkin to waste time processing the packet through all the filters it passed successfully, and the attacker is still filling up the downstream queue with more bad packets that could prevent other clients from connecting. Even placing the blocklist at the start filter queue doesn't stop all of the downstream worker's queues being filled, since the work is distributed before that. Another issue that arises in this setup is handling authentication failure in the chain, if something exceeds your max number of attempts, you essentially need blocklist functionality so that you're not sending too many requests to your auth service. So you would you need to reimplement blocklists inside your filter, while we can re-use internals, it would be nice if external user's writing their own filter's could leverage shared and consistent IP filtering capabilities, to make it easier to do what you want. Now instead imagine the same chain with instead of immediately dropping the packet, before doing that it first sends a ban event with the IP, port, and duration of the ban. The downstream distributor as part of it's loop receives any ban events and adds them to the list. Now when the distributor receives anything from a banned attacker, they are dropped before the packet is placed in the queue, while there might be couple of packets still in the downstream, those will be discarded and the queue becomes free to fill up with packets from other sources. One other advantage an event system has, is that it would make it easier to share things between Quilkin processes across a cluster, because a process could rebroadcast an internal event to other processes. |
I don't disagree with this, but also, I'm not sure if Filters should have a tonne of insight into its pipeline while it's running 🤔 As a starting position I think we should also consider Quilkin's as an array of instances when thinking of them -- such that if we are specifically talking about Blocklist type filtering - I posit that a "block event" of any sort, should be sent to some central authority that can then coordinate that out to all the Quilkin instances that are in existence. This could potentially either be as a sidecar, or as a set that are sitting in front of game servers. (This fails for client side proxying, but I think that's the exception rather than the norm, and should still work as needed). My initial thought on Rules and Actions would be to start very simple -- when a rate limit is passed, or when an authentication attempt fails (maybe a certain number of times), we send a defined json packet to an external webook, and let it decide what the appropriate course of action is through the xDS server. If we are doing a blocking action, I think we will have to generally do it across the entire array anyway (or at least the array that the we know a udp generating client is aware of, which may be some or all of the entire set) - so I would posit it makes sense to drive it back up to the xDS server to do the coordination, rather than doing it single Quilkin instance at a time. Or maybe a more meta question could be: How fast do we want/need rules and actions to be determined and acted upon? If it's super fast, then Rules + Actions need to be able to self-edit the configuration that exists in its current instance (which I think will make things super hard for xDS, since each instance of Quilkin becomes a unique configuration), or if it can take some time (seconds, or 10s of seconds, depending), pushing data up to the central xDS authority to coordinate as need be allows it to be managed in a single place, and/or combined with external data, trends, etc. |
I'm saying that filters shouldn't have a lot of insight. If we don't allow filters to have introspection we need to provide a way for them to communicate the state changes they want to the system, and I think that's events. I want to separate blocklists a bit here from events in general, because I think regardless of what level you want the IP block to happen (instance, OS, cluster) we still want to have this architecture because it allows to easily integrate features like blocklists without requiring a lot changes to filters, it becomes easier to handle things like duplicate events at this level before handing off to external services, and would provide a path for custom event listeners later. To go back to blocklists specifically, I think if you want to share IP rules across a cluster, I would think it would be easier to have Quilkin set the rules through
I think the most realistic is to handle things as fast as possible in an eventually consistent manner. It's okay for example if between when a ban event is fired and the event is processed, one or two different packets from the client get through. |
would this 'block event' mean that the sender is currently blocking packets from that ip:port or merely as a suggestion that the ip:port should be blocked? I think there are two different things being discussed here: sending events vs dropping packets My understanding was that the idea behind events was for a filter to be able to send packet processing notifications e.g if an ip:port pair hits a rate limit, the rate limiter (optionally) sends an event that this has happened But we could decide to add a feature that says - if we know for sure that a filter in the chain is going to drop a packet no need to waste time pre-processing it and lets drop it as early as possible. I think this has very little to do with events e.g here the ratelimiter would use some mechanism to signal downstream distributor to drop packets from |
As an aside - this may give us impetus to make internal session storage available to a Filter - so they could become rule + event only Filters. Action item
Then we can start building out from there. |
Was talking about events and subscribers with a coworker, and mentioned https://cloudevents.io/ a standard for eventing systems that a lot of things follow (knative does this). Writing this down, so I can read it before doing any design work. |
Going through it, it seems like it would be suitable enough for us, there's even already an SDK for Rust. https://github.com/cloudevents/sdk-rust |
Yeah, looks like it's a great fit. Although I see the specification allows for batching of events, I can't see how to create a batch of events through the SDK. I can work it out though. Also has all the adaptors to just about everything. Means we can plug into lots of things as the SDKs and spec continue to expand. |
Question asked on the Slack channel for the SDKs about batching events. It looks like it's in the spec, but not in the SDKs 🤔. Will post here what I find out. I do love that this is adopted by a variety of CNCF projects though - so it does seem like a good path. |
Chatting on Slack: Confirmed that batch operations aren't supported by the SDK, but they are specified in the spec. Looking at the Rust code, doesn't look like it would be hard to start building out batch JSON support - it's really just a Vec with a different HTTP content header. This might be my "over the holidays" project 😄 |
PR in for CloudEvent Rust SDK for Batch support to send batch events: Will tackle sending and processing in Go as well, so there's a foundation for both sending and receiving events in our most used languages for each use case. |
Created this for the Go library - that library requires a bit more work to get it all fully integrated, but this allows for parsing of batch events in a standard go handler. |
cloudevents/sdk-rust#200 (comment) just got merged, so if I want to start working on an implementation of this, I can at least use the git as my source of truth for the moment. |
cloudevents/sdk-go#829 just got merged, so I think we have enough support in the SDKs to start with the Event work, since there is batch support 🤸🏻 |
One thing I'm trying to work out with events and subscriptions, is how to configure it. My initial theory is to have: version: v1alpha1
id: my-proxy # An identifier for the proxy instance.
events:
batch: # when to send a batch of events - make it either:
timeout: 5s # after a given time. Default, 5 seconds.
amount: 1000 # after received this number of events. Default, 1000 packets.
subscriber: # subscriber configuration
type: http # right now only support http, but option to add more in the future
http: # http webhook configuration
url: https://webhook/events
caBundle: # optional base64 encoded client cert
# ... all the rest of the config if there is one. My only concern is that this would (probably?) be proxy configuration specific. Or do we think that the xDs and/or relay service might emit events to a subscriber? Or does it not matter too much if it's proxy specific? Configuring this by command line... thinking something like
WDYT? |
Answering the question - what should happen if one or more packets are invalid for a filter?
For example, if we had a rate limiting filter, and we get more packets from an IP and port, how do we notify something that that player may need to be blocked/banned?
Some ideas:
The text was updated successfully, but these errors were encountered: