Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTPtables migration to eBPF #102

Open
krizhanovsky opened this issue May 6, 2015 · 4 comments
Open

HTTPtables migration to eBPF #102

krizhanovsky opened this issue May 6, 2015 · 4 comments

Comments

@krizhanovsky
Copy link
Contributor

krizhanovsky commented May 6, 2015

Tempesta Language is a DSL for L4-L7 network data processing. While L3 data is visible for TL programs, it's not assumed to work on L3 due to higher overheads in comparison with eBPF and nftables. TL programs run in softirq context, so can not sleep and block.

Must be implemented JIT language for dynamic filtering and classification rules, traffic transformation and whatever anyone wants. The language must have abilities to implement Frang, sticky cookies, load balancing and few other current features in more robust way.

Consider following extract from access.log for a real world DDoS attack:

    :7.5.2.1 - - [15/Apr/2009:13:23:54 +0400] 403 "GET http://example.org/forum/indexer.php HTTP/1.1" 219 "-" "Mozilla/5.0 (Slurp/cat; vaginamook@inktomi.com; http://www.supercocklol.com/slurp.html)" "http_x_forwarded_for"
    :8.2.8.2 - - [15/Apr/2009:13:23:54 +0400] 403 "GET http://example.org/forum/indexer.php HTTP/1.1" 219 "-" "Mozilla/4.0 compatible ZyBorg/1.0 (wn.zyborg@looksmart.net; http://www.lolyousuck.com)" "http_x_forwarded_for"
    :2.4.9.1 - - [15/Apr/2009:13:23:54 +0400] 499 "GET / HTTP/1.1" 0 "-" "Mozilla/5.0 (Windows; U; Windows NT 5.1; ru; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1" "http_x_forwarded_for"
    :2.1.4.7 - - [15/Apr/2009:13:23:54 +0400] 499 "GET / HTTP/1.1" 0 "-" "Opera/9.02 (Windows NT 5.1; U; ru)" "http_x_forwarded_for"

In this case following rules might be helpful:

  1. a regular expression for '.(supercocklol|lolyousuck).' against User-Agent (however, true regular expression engine is the subject for Multi-pattern regular expressions #496 );
  2. absence of several HTTP headers usual for normal HTTP requests;
  3. IP subnets.

There are couple of examples of the assumed implementation to get sense of the language. SSL Heartbleed can be filtered out by following expression:

    if (client.tcp.src == 443
        && client.tcp.data[52] == 0x18
        && client.tcp.data[53] == 0x03
        && client.tcp.data[54] >= 0 && client.tcp.data[54] <= 0xFF
        && client.tcp.data[55] >= 0 && client.tcp.data[55] <= 0xFF)
            # Block the attacker forever.
            tdb.insert("ip_filter", client.addr, evict=PERSISTENT);

The problem with IPTables and eBPF is that the tools work with separate skbs, so their rules are easy to be eluded by splitting TCP segments into multiple IP packets. Thus, TL must work on TCP stream and higher layers. Internally packets bounds must be processed by storing current matching state of a FSM, i.e. we need a Turing complete language while eBPF isn't such kind of a language. Also eBPF uses restricted instruction set and its programs are not more than 4K instructions. The restictions aren't good. Thus, the better way is SystemTap's like: compile TL into C kernel module and run it with user-space interfaces (like eBPF maps) using Kernel-User Space Transport.

Another example from CloudFlare is

    if (req.user_agent =~ ":80$")
            # Block for 10 seconds.
            tdb.insert("ip_filter", client.addr, evict=10000);

The rule matches only User-Agent value from the end instead of scanning the whole packet as IPtables string module or eBPF do.

While current Frang rule set doubles IPtables functionality (e.g. connections limiting), we still need to account such low level information to be able to specify complex multi-layer rules, e.g. "block a client with more than 10 connections for the last minute and without User-Agent header".

Basically TL must provide very close filtering abilities to Suricata. However, while Tempesta FW is a TCP end point, then it isn't vulnerable by IDS evasion techniques. Also the overall system processes HTTP only once (instead of processing it at the IDS and a Web accelerator) and there is no need to place SSL terminator before the IDS introducing multiple data copyings and context switches.

The engine must also provide rewrite logic, at least for HTTP headers, but also for arbitrary HTTP message part (e.g. to implement SSI/ESI extensions).

Since relatively complex logic is expected to be implemented using TL, TL programs must be implemented as GFSM subroutines, explicitly or implicitly usng high level alnguage constructions like yield operator.

Also consider WASM extensions, see Envoy as an example. There is an ABI specification.

@krizhanovsky krizhanovsky self-assigned this May 6, 2015
@krizhanovsky krizhanovsky added this to the 1.0 Release milestone May 6, 2015
@krizhanovsky krizhanovsky mentioned this issue May 6, 2015
This was referenced May 22, 2016
@krizhanovsky krizhanovsky modified the milestones: backlog, 0.11 Tempesta Language Jan 14, 2018
krizhanovsky added a commit that referenced this issue Nov 14, 2021
low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (#77)
and Tempesta Language (#102).
@krizhanovsky krizhanovsky modified the milestones: 0.9 - TDB, 1.2 TBD Jan 3, 2022
@krizhanovsky krizhanovsky changed the title Tempesta Language (TL) Generic configuration language (Tempesta Language) Jan 26, 2022
ttaym added a commit to ttaym/tempesta that referenced this issue Feb 21, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (tempesta-tech#77)
and Tempesta Language (tempesta-tech#102).

Contributes to tempesta-tech#755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
ttaym added a commit to ttaym/tempesta that referenced this issue Feb 22, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (tempesta-tech#77)
and Tempesta Language (tempesta-tech#102).

Contributes to tempesta-tech#755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
ttaym added a commit to ttaym/tempesta that referenced this issue Feb 22, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (tempesta-tech#77)
and Tempesta Language (tempesta-tech#102).

Contributes to tempesta-tech#755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
ttaym added a commit to ttaym/tempesta that referenced this issue Feb 22, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (tempesta-tech#77)
and Tempesta Language (tempesta-tech#102).

Contributes to tempesta-tech#755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
ttaym added a commit to ttaym/tempesta that referenced this issue Feb 22, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (tempesta-tech#77)
and Tempesta Language (tempesta-tech#102).

Contributes to tempesta-tech#755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
ttaym added a commit that referenced this issue Feb 24, 2022
Almost literaly follow ak patch from 2eae1da

Replace GFSM calls with direct calls to TLS and HTTP handlers
 on low level networking layers.

GFSM was designed to build graphs of network protocols FSMs (this
design was inspired by FreeBSD netgraph). However, during the years
neither we nor external users have any requirements to introduce
any modules which use GFSM to hook TLS or HTTP entry code. There
are only 2 users of the mechanism for TLS and HTTP for now:
1. TLS -> HTTP protocols handling
2. HTTP limits (the frang module)

This patch replaces GFSM calls with direct calls to
tfw_http_req_process(), tfw_tls_msg_process() and frang_tls_handler()
in following paths:
1. sync sockets -> TLS
2. sync sockets -> HTTP
3. TLS -> HTTP
4. TLS -> Frang

As the result the function tfw_connection_recv() was eliminated.
Now the code is simpler and has lower overhead.

We still might need GFSM for the user-space requests handling (#77)
and Tempesta Language (#102).

Contributes to #755

Based-on-patch-by: Alexander K <ak@tempesta-tech.com>
Signed-off-by: Aleksey Mikhaylov <aym@tempesta-tech.com>
@krizhanovsky
Copy link
Contributor Author

krizhanovsky commented Apr 12, 2022

HTTPtables implement 2 logical operators:

  • AND by consequent rules in the same chain
  • OR by a chain calling another chain

tfw_http_tbl_scan() spins in a loop, while there is another chain for a current action.This is just a one example of the overhead introduced by implementing simple operations execution by a C data structures and loops.

These operators can be implemented in a more efficient way - just compile the rules into binary code and exit from the function, when we're done. This is what eBPF actually does. With the new implementation we need to implement basic operations, e.g. strings matching and functions for the actions and glue this all with eBPF.

The language should use approach similar to bpftrace: compile a C-like language with some Python-like syntax sugar into a C program compiled into BPF.

The language scripts must be able to operate with general purpose Tempesta DB hash tables. We should not use the BPF hashes because we need to be able to operate with them also using tdbq and/or REST API (e.g. see an HAproxy bots protection examples of using stick tables).

@krizhanovsky krizhanovsky changed the title Generic configuration language (Tempesta Language) [HTTPtables] Generic language (Tempesta Language) Nov 14, 2022
@krizhanovsky
Copy link
Contributor Author

HTTPtables develops functionality. There is no need to work with TCP and IP layer, since we have integration with the Netfilter, which already does this perfectly.

Once the HTTPtables architecture is reworked for better performance, we can close the issue.

@kolinfluence
Copy link

kolinfluence commented Jan 2, 2023

  1. ebpf at xdp layer shld not be touched because as mentioned, tempesta "shld be a bit too slow". however, whatever the implementation toward using xdp shld be like a "optional plugin" to go with tempesta if need be. means look at xdp-filter under xdp-tutorial so just adding an IP towards the xdp-filter to block is better.

i'm not sure how fast tempesta is (hvnt tested) but

  1. if u need mass adoption, shld look at the usage of af_xdp in future. wasm is not really a good way for this but more of a c servlet. (check ulib for their c servlet implementation) or along the lines of these integration.

if u really need wasm, u shld still hv an af_xdp adoption layer too, and make it rust i guess.

u shld look at bytedance/monoio and be in line with their implementation compatibility if possible coz they try to go for the ultimate performance for web server.

  1. tempesta is better suited as a tail call addition before or after the tc layer on ebpf.

everything i've mentioned shld take quic into future consideration and all of which i have mentioned does in their own way.
u can also look at pantuza equic github for "future" compatibility issues etc.

everything mentioned is vague but what i am emphasizing for tempesta to move forward faster is...

  1. wider adoption through compatibility with existing tool chains (or at least more compatible so modification is lesser)

  2. better standardization to leverage on the development of future toolchain.

doing so can move tempesta dev faster and easier migration to quic etc in future.

@krizhanovsky krizhanovsky modified the milestones: 1.x: TBD, 1.0 - GA May 24, 2023
@krizhanovsky krizhanovsky changed the title [HTTPtables] Generic language (Tempesta Language) HTTPtables migration to eBPF Nov 12, 2023
@krizhanovsky
Copy link
Contributor Author

This relates to Tempesta xFW, the enterprise volumetric DDoS mitigation module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants