Rules and Actions based on Filter events #63

markmandel · 2020-06-24T15:51:04Z

Answering the question - what should happen if one or more packets are invalid for a filter?

For example, if we had a rate limiting filter, and we get more packets from an IP and port, how do we notify something that that player may need to be blocked/banned?

Some ideas:

Fire a webhook
Fire an embedded script like in Filter idea: Scriptable filter #13

XAMPPRocky · 2021-07-21T12:57:11Z

Per #158 (comment) I'm going to write up some motivations on why IP filtering is one of the reasons we should have events. I'll start with just mentioning that purely static blocklists aren't really something we focus on, because Quilkin is not the right tool for that problem. We might support it, but really if you know what IPs you want to ban, you should set them using iptables/nftables and prevent them from ever reaching quilkin.

Really the main the problem that needs design is dynamic bans, and I think solving dynamic IP filtering through filters is fundamentally flawed, because filters by design don't have a tonne of control or insight into Quilkin's pipeline while it's running. Take the following example of a chain of filters. Each of these filters have a case where they want to start dropping both this packet and future packets from a sender either indefinitely or some fixed period of time.

Blocklist (If IP matches list)
Ratelimit (If requests exceeds limits)
Authentication (If requests exceeds max retries)

The problem with using filters fundamentally is that despite dropping the packets, the attacker can still saturate our entire downstream up to the code branch that decides to drop the packet. So even though we succeed to foiling the attacker by preventing the packet from reaching the servers, the attacker's still got Quilkin to waste time processing the packet through all the filters it passed successfully, and the attacker is still filling up the downstream queue with more bad packets that could prevent other clients from connecting. Even placing the blocklist at the start filter queue doesn't stop all of the downstream worker's queues being filled, since the work is distributed before that.

Another issue that arises in this setup is handling authentication failure in the chain, if something exceeds your max number of attempts, you essentially need blocklist functionality so that you're not sending too many requests to your auth service. So you would you need to reimplement blocklists inside your filter, while we can re-use internals, it would be nice if external user's writing their own filter's could leverage shared and consistent IP filtering capabilities, to make it easier to do what you want.

Now instead imagine the same chain with instead of immediately dropping the packet, before doing that it first sends a ban event with the IP, port, and duration of the ban. The downstream distributor as part of it's loop receives any ban events and adds them to the list. Now when the distributor receives anything from a banned attacker, they are dropped before the packet is placed in the queue, while there might be couple of packets still in the downstream, those will be discarded and the queue becomes free to fill up with packets from other sources.

One other advantage an event system has, is that it would make it easier to share things between Quilkin processes across a cluster, because a process could rebroadcast an internal event to other processes.

markmandel · 2021-07-26T20:57:10Z

Really the main the problem that needs design is dynamic bans, and I think solving dynamic IP filtering through filters is fundamentally flawed, because filters by design don't have a tonne of control or insight into Quilkin's pipeline while it's running.

I don't disagree with this, but also, I'm not sure if Filters should have a tonne of insight into its pipeline while it's running 🤔

As a starting position I think we should also consider Quilkin's as an array of instances when thinking of them -- such that if we are specifically talking about Blocklist type filtering - I posit that a "block event" of any sort, should be sent to some central authority that can then coordinate that out to all the Quilkin instances that are in existence. This could potentially either be as a sidecar, or as a set that are sitting in front of game servers.

(This fails for client side proxying, but I think that's the exception rather than the norm, and should still work as needed).

My initial thought on Rules and Actions would be to start very simple -- when a rate limit is passed, or when an authentication attempt fails (maybe a certain number of times), we send a defined json packet to an external webook, and let it decide what the appropriate course of action is through the xDS server.

If we are doing a blocking action, I think we will have to generally do it across the entire array anyway (or at least the array that the we know a udp generating client is aware of, which may be some or all of the entire set) - so I would posit it makes sense to drive it back up to the xDS server to do the coordination, rather than doing it single Quilkin instance at a time.

Or maybe a more meta question could be: How fast do we want/need rules and actions to be determined and acted upon?

If it's super fast, then Rules + Actions need to be able to self-edit the configuration that exists in its current instance (which I think will make things super hard for xDS, since each instance of Quilkin becomes a unique configuration), or if it can take some time (seconds, or 10s of seconds, depending), pushing data up to the central xDS authority to coordinate as need be allows it to be managed in a single place, and/or combined with external data, trends, etc.

XAMPPRocky · 2021-07-27T08:30:42Z

I don't disagree with this, but also, I'm not sure if Filters should have a tonne of insight into its pipeline while it's running 🤔

I'm saying that filters shouldn't have a lot of insight. If we don't allow filters to have introspection we need to provide a way for them to communicate the state changes they want to the system, and I think that's events. I want to separate blocklists a bit here from events in general, because I think regardless of what level you want the IP block to happen (instance, OS, cluster) we still want to have this architecture because it allows to easily integrate features like blocklists without requiring a lot changes to filters, it becomes easier to handle things like duplicate events at this level before handing off to external services, and would provide a path for custom event listeners later.

To go back to blocklists specifically, I think if you want to share IP rules across a cluster, I would think it would be easier to have Quilkin set the rules through iptables/nfables, and just share that configuration across the cluster.

Or maybe a more meta question could be: How fast do we want/need rules and actions to be determined and acted upon?

I think the most realistic is to handle things as fast as possible in an eventually consistent manner. It's okay for example if between when a ban event is fired and the event is processed, one or two different packets from the client get through.

iffyio · 2021-07-27T11:18:01Z

I posit that a "block event" of any sort, should be sent to some central authority that can then coordinate that out to all the Quilkin instances that are in existence

would this 'block event' mean that the sender is currently blocking packets from that ip:port or merely as a suggestion that the ip:port should be blocked? I think there are two different things being discussed here: sending events vs dropping packets

My understanding was that the idea behind events was for a filter to be able to send packet processing notifications e.g if an ip:port pair hits a rate limit, the rate limiter (optionally) sends an event that this has happened ipX:portY is being rate limited and (always?) drops the packet. Then its up to the external webhook/receiver to determine whenever an event of any kind results in blocking some ip:port by e.g updating a blocklist filter via xds. Both sides are largely independent of each other though (e.g the webhook events might only be used for telemetry, or something else entirely triggers an xds update with blocklist filter)

But we could decide to add a feature that says - if we know for sure that a filter in the chain is going to drop a packet no need to waste time pre-processing it and lets drop it as early as possible. I think this has very little to do with events e.g here the ratelimiter would use some mechanism to signal downstream distributor to drop packets from ipX:portY for the next n milliseconds the first time.
This feature is rather only an optimization though - especially if we're not using queues, the difference between having the rate limiter drop the packet vs dropping the packet as soon as it comes off the socket might not be at all significant. And there are some things to consider that could complicate its impl like if a filter signals to block an ip:port for the next hour but the filterchain is shortly after updated and that filter is removed in the new update, the ban should no longer be valid. but I'm thinking this feature should be considered separately from events in any case.

markmandel · 2022-06-29T15:31:35Z

Per filter configuration on when to fire an "event".
Each Filter (in code) we want them to be able to fire an event without having to worry about batching or processing or whatever.
Filter event firing should be independent of processing (so it could happen on drop or successful passing).
At the proxy configuration would have aspects such as:
- How to much to batch data
- Where to send the events (webhook config) (Subscriber)

As an aside - this may give us impetus to make internal session storage available to a Filter - so they could become rule + event only Filters.

Action item

Design some sample yaml for Filter level Rule config (Rate Limiter is the easy one)
Design some sample yaml for proxy level configuration
Design the structure of data that is going to go to a Subscriber

Then we can start building out from there.

markmandel · 2022-12-13T00:22:51Z

Was talking about events and subscribers with a coworker, and mentioned https://cloudevents.io/ a standard for eventing systems that a lot of things follow (knative does this). Writing this down, so I can read it before doing any design work.

XAMPPRocky · 2022-12-13T01:33:34Z

Going through it, it seems like it would be suitable enough for us, there's even already an SDK for Rust. https://github.com/cloudevents/sdk-rust

markmandel · 2022-12-13T01:41:37Z

Yeah, looks like it's a great fit. Although I see the specification allows for batching of events, I can't see how to create a batch of events through the SDK. I can work it out though.

Also has all the adaptors to just about everything. Means we can plug into lots of things as the SDKs and spec continue to expand.

markmandel · 2022-12-13T20:34:25Z

Question asked on the Slack channel for the SDKs about batching events. It looks like it's in the spec, but not in the SDKs 🤔. Will post here what I find out.

I do love that this is adopted by a variety of CNCF projects though - so it does seem like a good path.

markmandel · 2022-12-13T22:33:04Z

Chatting on Slack: Confirmed that batch operations aren't supported by the SDK, but they are specified in the spec.
https://github.com/cloudevents/spec/blob/v1.0.2/cloudevents/formats/json-format.md#4-json-batch-format

Looking at the Rust code, doesn't look like it would be hard to start building out batch JSON support - it's really just a Vec with a different HTTP content header.

This might be my "over the holidays" project 😄

markmandel · 2022-12-21T23:39:49Z

PR in for CloudEvent Rust SDK for Batch support to send batch events:

Batch Event implementation for reqwest bindings cloudevents/sdk-rust#200

Will tackle sending and processing in Go as well, so there's a foundation for both sending and receiving events in our most used languages for each use case.

markmandel · 2022-12-22T22:46:51Z

Created this for the Go library - that library requires a bit more work to get it all fully integrated, but this allows for parsing of batch events in a standard go handler.

http: Batch Events from HTTP Request and Response cloudevents/sdk-go#829

markmandel · 2023-01-03T23:29:20Z

cloudevents/sdk-rust#200 (comment) just got merged, so if I want to start working on an implementation of this, I can at least use the git as my source of truth for the moment.

markmandel · 2023-01-26T18:24:21Z

cloudevents/sdk-go#829 just got merged, so I think we have enough support in the SDKs to start with the Event work, since there is batch support 🤸🏻

markmandel · 2023-02-10T20:49:38Z

One thing I'm trying to work out with events and subscriptions, is how to configure it.

My initial theory is to have:

version: v1alpha1
id: my-proxy # An identifier for the proxy instance.
events:
  batch: # when to send a batch of events - make it either:
    timeout: 5s  # after a given time. Default, 5 seconds.
    amount: 1000 # after received this number of events. Default, 1000 packets.
  subscriber: # subscriber configuration
    type: http # right now only support http, but option to add more in the future
    http: # http webhook configuration
      url: https://webhook/events
      caBundle: # optional base64 encoded client cert  
# ...  all the rest of the config if there is one.

My only concern is that this would (probably?) be proxy configuration specific. Or do we think that the xDs and/or relay service might emit events to a subscriber?

Or does it not matter too much if it's proxy specific?

Configuring this by command line... thinking something like

$ quilkin proxy --event-batch-amount 1000 --event-batch-timeout 5s --event-subscriber-http-url https://webhook/events

WDYT?

markmandel added kind/feature New feature or request kind/design Proposal discussing new features / fixes and how they should be implemented labels Jun 24, 2020

This was referenced Jul 5, 2021

Regex Filter #316

Closed

Blocklist Filter #158

Closed

XAMPPRocky added the priority/high Issues that should be addressed as soon as possible. label Jul 5, 2021

markmandel self-assigned this Dec 21, 2022

markmandel mentioned this issue Feb 14, 2023

Replace Config with configuration components #700

Open

XAMPPRocky closed this as not planned Won't fix, can't repro, duplicate, stale Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rules and Actions based on Filter events #63

Rules and Actions based on Filter events #63

markmandel commented Jun 24, 2020

XAMPPRocky commented Jul 21, 2021 •

edited

Loading

markmandel commented Jul 26, 2021

XAMPPRocky commented Jul 27, 2021 •

edited

Loading

iffyio commented Jul 27, 2021

markmandel commented Jun 29, 2022

markmandel commented Dec 13, 2022

XAMPPRocky commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 21, 2022

markmandel commented Dec 22, 2022

markmandel commented Jan 3, 2023

markmandel commented Jan 26, 2023

markmandel commented Feb 10, 2023

Rules and Actions based on Filter events #63

Rules and Actions based on Filter events #63

Comments

markmandel commented Jun 24, 2020

XAMPPRocky commented Jul 21, 2021 • edited Loading

markmandel commented Jul 26, 2021

XAMPPRocky commented Jul 27, 2021 • edited Loading

iffyio commented Jul 27, 2021

markmandel commented Jun 29, 2022

markmandel commented Dec 13, 2022

XAMPPRocky commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 13, 2022

markmandel commented Dec 21, 2022

markmandel commented Dec 22, 2022

markmandel commented Jan 3, 2023

markmandel commented Jan 26, 2023

markmandel commented Feb 10, 2023

XAMPPRocky commented Jul 21, 2021 •

edited

Loading

XAMPPRocky commented Jul 27, 2021 •

edited

Loading