DosHandler #12068

gregw · 2024-07-22T06:57:05Z

Fix #10478. This is a simple DosHandler that delays and then rejects all requests over a limit.

gregw · 2024-07-22T06:59:22Z

@sbordet @lorban Can you give this a quick initial review before I commit time to tests, config and doco.

lorban · 2024-07-23T09:03:18Z

Speaking about the design, I see two problems if you configure a moderately large throughput (say 10K req/s):

You need to allocate an array with one 64-bit entry per request per second. For 10K req/s that's ~80 KB of memory that's constantly scanned and updated. That alone will totally trash all CPU caches.
There's a single lock protecting that array. I'm not sure this could even reach 10K req/s.

gregw · 2024-07-24T00:48:31Z

@lorban how about this version that uses an exponential moving average?

jetty-core/jetty-server/src/main/java/org/eclipse/jetty/server/handler/DosHandler.java

gregw · 2024-07-26T22:18:57Z

@sbordet Can you take this one on?

sbordet · 2024-07-29T14:44:18Z

What I'd like:

Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.
Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.
usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.
Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

I would keep this Handler as simple as possible: if rate is exceeded, reject.

gregw · 2024-07-31T07:33:28Z

@sbordet

What I'd like:

Pluggable algorithm for rate exceeded -- this will reduce the number of parameters to the constructor, by separating the parameters for the algorithm from those just related to the DoSHandler like maxTrackers.

OK, but I don't see replacing two algorithm arg (samplePeriodMs and alpha) with one new ExponentialMovingAverage(samplePeriodMs, alpha) as much of a saving in parameters. But I'm OK to have the algorithm pluggable.

But then perhaps we should do the same for the ID extraction, which is currently a protected method and two constructor arguments. I'll have a play and see...

Now each tracker is a CyclicTimeout, but DosHandler should handle all the Trackers via CyclicTimeouts.

OK.

usePort seems weird, as using the ephemeral port from the client seems going against the purpose of the DoS defense: the client will use many ephemeral ports.

It if for the use case when the server is behind some intermediary, so the remote IP is the intermediary and not the client. Sometimes you cannot trust the Forwarded-for headers because not all intermediaries are smart enough to police that they are not sent from the client itself. So using the source port on the intermediary is a proxy for identifying the connection and thus the client. This is (was?) commonly used by Cisco smart routers. But if we make the ID algorithm pluggable, then this can be done at no cost.

Not sure I understand the current algorithm: if a client sends 11 requests, and the 11th exceeds the rate, it is queued, but I'd say it's simpler to reject it immediately. Right now there is a hard-coded 2 s timeout that when expires rejects the queued request. Rejecting immediately would simplify (no queue, no queue parameters), and I see no point waiting 2 seconds to reject?

The idea of delay rather than rejecting is to delay additional requests on the same connection. An attacker can pipeline many HTTP/1 requests in a single TCP frame that is buffered. If you just reject the request, then the next one will be there and you will need to do work to reject that. Closing the connection can avoid that, but then that tells the attacker that they need to open another connection to continue the attack. By delaying, the attacker does not know if they are causing DOS or not, and they have to hold open resources to keep the connection alive.

I would keep this Handler as simple as possible: if rate is exceeded, reject.

Reject is not good enough. We'd have to close the connection to avoid issues of many pipelined h1 requests. But then we don't have that semantic for H2 and H3 connections, i.e. we can send a response to a single request that will close all other streams on the same h2 connection.

Delay is expensive for the server, so perhaps we should come up with a way of closing h2 and h3 connections?

gregw · 2024-07-31T07:51:16Z

@sbordet I've pushed a change to make the ID a pluggable function rather than fixed. It is OK, but might a bit of effort to make it configurable from a module ini file.

I'll try the same approach for the rate algorithm...

sbordet · 2024-07-31T07:59:00Z

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

gregw · 2024-07-31T08:29:57Z

@gregw are you planning of integrating request handling with connection acceptance? I.e. stop accepting connections from the suspicious IP?

I wasn't..... but that could be a good idea. Would need new semantic in our connectors, but we need to add some new semantic any if we are to be able to close a connection for h2/h3.

gregw · 2024-07-31T08:57:02Z

@sbordet I've made the Rate pluggable now as well. It is all a bit messy and lacks javadoc, but give me some feedback on the direction before I commit any more effort. I might work a little bit tomorrow as well.

sbordet

@gregw I think Rate could be renamed to RateControl (which we already have in jetty-http2) or similar, but rather than having a getRate() and having to pass the maxRequestPerSecond as an extra parameter, I'd prefer a RateControl.isExceeded(Request), so all parameters end up in the RateControl.Factory specific implementation.

Also, given that reject+block connections, or wait+reject are valid strategies, then perhaps we need to abstract that too, introducing a RejectHandler that also can be implementation specific.
One of the implementations can be linked to the Connector to block connections (which perhaps requires more changes -- I think we have suspend accepting for all remote clients, but not from specific IPs).

DoSHandler(Handler, PeerMapper, RateControl.Factory, RejectHandler) { ... }

where PeerMapper is your Function<Request, String> to map the remote peer.

gregw · 2024-07-31T20:58:21Z

@sbordet A wise programmer once said:

I would keep this Handler as simple as possible

By making this handler entirely pluggable, I think perhaps we are adding complexity where none is needed. Writing an entirely new custom handler is not that difficult and is better documented than writing three implementations of unknown interfaces to plug into this Handler.
This pluggability will also make XML configuration really difficult.

I suggest we consider going back to a simple handler, with the algorithms in protected methods where possible, and keep it simple. If anybody wants something different, they can extend or replace.
I'd prefer reject+block as the default algorithm, but that needs wider changes and less simplicity. So I think the delay+reject approach is fine for a simple DosHandler.

gregw · 2024-08-01T00:28:30Z

@sbordet I've added in pluggable rejection. It is not too ugly. I'll try the xml configuration soon. If you have time for some more feedback/review, it would be appreciated before I commit too much to the XML

gregw · 2024-08-01T07:08:41Z

@sbordet I've added XmlConfiguration and filled out the DosHandler a bit more. It's a little more complex than I'd like, but it is not too bad.

gregw · 2024-08-23T01:48:30Z

@sbordet @lachlan-roberts @lorban nudge!!! 3 weeks is too long to wait for feedback!

Added tests

named arguments

sbordet

I think the implementation can be largely simplified.

onRequest() is able to calculate the nanoTime at which will become idle if there are no more events (now + one/two periods), so CyclicTimeouts can take care of idle trackers.

The state should only need the nanoTime of the start of the period, and now many requests in that period, so just 2 longs that can be guarded by a lock (it's per-client anyway, and it's only contended if the client bombs the server).
Simplifying and using a hardcoded 1 second period should result in much simpler code, but a generic period is as simple.

Let's discuss this in person.

jetty-core/jetty-server/src/main/config/modules/dos.mod

sbordet · 2024-10-15T16:34:51Z