Skip to content

Commit

Permalink
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
Browse files Browse the repository at this point in the history
Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your net-next
tree. This batch comes with more input sanitization for xtables to
address bug reports from fuzzers, preparation works to the flowtable
infrastructure and assorted updates. In no particular order, they are:

1) Make sure userspace provides a valid standard target verdict, from
   Florian Westphal.

2) Sanitize error target size, also from Florian.

3) Validate that last rule in basechain matches underflow/policy since
   userspace assumes this when decoding the ruleset blob that comes
   from the kernel, from Florian.

4) Consolidate hook entry checks through xt_check_table_hooks(),
   patch from Florian.

5) Cap ruleset allocations at 512 mbytes, 134217728 rules and reject
   very large compat offset arrays, so we have a reasonable upper limit
   and fuzzers don't exercise the oom-killer. Patches from Florian.

6) Several WARN_ON checks on xtables mutex helper, from Florian.

7) xt_rateest now has a hashtable per net, from Cong Wang.

8) Consolidate counter allocation in xt_counters_alloc(), from Florian.

9) Earlier xt_table_unlock() call in {ip,ip6,arp,eb}tables, patch
   from Xin Long.

10) Set FLOW_OFFLOAD_DIR_* to IP_CT_DIR_* definitions, patch from
    Felix Fietkau.

11) Consolidate code through flow_offload_fill_dir(), also from Felix.

12) Inline ip6_dst_mtu_forward() just like ip_dst_mtu_maybe_forward()
    to remove a dependency with flowtable and ipv6.ko, from Felix.

13) Cache mtu size in flow_offload_tuple object, this is safe for
    forwarding as f87c10a describes, from Felix.

14) Rename nf_flow_table.c to nf_flow_table_core.o, to simplify too
    modular infrastructure, from Felix.

15) Add rt0, rt2 and rt4 IPv6 routing extension support, patch from
    Ahmed Abdelsalam.

16) Remove unused parameter in nf_conncount_count(), from Yi-Hung Wei.

17) Support for counting only to nf_conncount infrastructure, patch
    from Yi-Hung Wei.

18) Add strict NFT_CT_{SRC_IP,DST_IP,SRC_IP6,DST_IP6} key datatypes
    to nft_ct.

19) Use boolean as return value from ipt_ah and from IPVS too, patch
    from Gustavo A. R. Silva.

20) Remove useless parameters in nfnl_acct_overquota() and
    nf_conntrack_broadcast_help(), from Taehee Yoo.

21) Use ipv6_addr_is_multicast() from xt_cluster, also from Taehee Yoo.

22) Statify nf_tables_obj_lookup_byhandle, patch from Fengguang Wu.

23) Fix typo in xt_limit, from Geert Uytterhoeven.

24) Do no use VLAs in Netfilter code, again from Gustavo.

25) Use ADD_COUNTER from ebtables, from Taehee Yoo.

26) Bitshift support for CONNMARK and MARK targets, from Jack Ma.

27) Use pr_*() and add pr_fmt(), from Arushi Singhal.

28) Add synproxy support to ctnetlink.

29) ICMP type and IGMP matching support for ebtables, patches from
    Matthias Schiffer.

30) Support for the revision infrastructure to ebtables, from
    Bernie Harris.

31) String match support for ebtables, also from Bernie.

32) Documentation for the new flowtable infrastructure.

33) Use generic comparison functions in ebt_stp, from Joe Perches.

34) Demodularize filter chains in nftables.

35) Register conntrack hooks in case nftables NAT chain is added.

36) Merge assignments with return in a couple of spots in the
    Netfilter codebase, also from Arushi.

37) Document that xtables percpu counters are stored in the same
    memory area, from Ben Hutchings.

38) Revert mark_source_chains() sanity checks that break existing
    rulesets, from Florian Westphal.

39) Use is_zero_ether_addr() in the ipset codebase, from Joe Perches.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
  • Loading branch information
davem330 committed Mar 30, 2018
2 parents b9a1260 + 26c97c5 commit d162190
Show file tree
Hide file tree
Showing 75 changed files with 1,383 additions and 858 deletions.
112 changes: 112 additions & 0 deletions Documentation/networking/nf_flowtable.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,112 @@
Netfilter's flowtable infrastructure
====================================

This documentation describes the software flowtable infrastructure available in
Netfilter since Linux kernel 4.16.

Overview
--------

Initial packets follow the classic forwarding path, once the flow enters the
established state according to the conntrack semantics (ie. we have seen traffic
in both directions), then you can decide to offload the flow to the flowtable
from the forward chain via the 'flow offload' action available in nftables.

Packets that find an entry in the flowtable (ie. flowtable hit) are sent to the
output netdevice via neigh_xmit(), hence, they bypass the classic forwarding
path (the visible effect is that you do not see these packets from any of the
netfilter hooks coming after the ingress). In case of flowtable miss, the packet
follows the classic forward path.

The flowtable uses a resizable hashtable, lookups are based on the following
7-tuple selectors: source, destination, layer 3 and layer 4 protocols, source
and destination ports and the input interface (useful in case there are several
conntrack zones in place).

Flowtables are populated via the 'flow offload' nftables action, so the user can
selectively specify what flows are placed into the flow table. Hence, packets
follow the classic forwarding path unless the user explicitly instruct packets
to use this new alternative forwarding path via nftables policy.

This is represented in Fig.1, which describes the classic forwarding path
including the Netfilter hooks and the flowtable fastpath bypass.

userspace process
^ |
| |
_____|____ ____\/___
/ \ / \
| input | | output |
\__________/ \_________/
^ |
| |
_________ __________ --------- _____\/_____
/ \ / \ |Routing | / \
--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit
\_________/ \__________/ ---------- \____________/ ^
| ^ | | ^ |
flowtable | | ____\/___ | |
| | | / \ | |
__\/___ | --------->| forward |------------ |
|-----| | \_________/ |
|-----| | 'flow offload' rule |
|-----| | adds entry to |
|_____| | flowtable |
| | |
/ \ | |
/hit\_no_| |
\ ? / |
\ / |
|__yes_________________fastpath bypass ____________________________|

Fig.1 Netfilter hooks and flowtable interactions

The flowtable entry also stores the NAT configuration, so all packets are
mangled according to the NAT policy that matches the initial packets that went
through the classic forwarding path. The TTL is decremented before calling
neigh_xmit(). Fragmented traffic is passed up to follow the classic forwarding
path given that the transport selectors are missing, therefore flowtable lookup
is not possible.

Example configuration
---------------------

Enabling the flowtable bypass is relatively easy, you only need to create a
flowtable and add one rule to your forward chain.

table inet x {
flowtable f {
hook ingress priority 0 devices = { eth0, eth1 };
}
chain y {
type filter hook forward priority 0; policy accept;
ip protocol tcp flow offload @f
counter packets 0 bytes 0
}
}

This example adds the flowtable 'f' to the ingress hook of the eth0 and eth1
netdevices. You can create as many flowtables as you want in case you need to
perform resource partitioning. The flowtable priority defines the order in which
hooks are run in the pipeline, this is convenient in case you already have a
nftables ingress chain (make sure the flowtable priority is smaller than the
nftables ingress chain hence the flowtable runs before in the pipeline).

The 'flow offload' action from the forward chain 'y' adds an entry to the
flowtable for the TCP syn-ack packet coming in the reply direction. Once the
flow is offloaded, you will observe that the counter rule in the example above
does not get updated for the packets that are being forwarded through the
forwarding bypass.

More reading
------------

This documentation is based on the LWN.net articles [1][2]. Rafal Milecki also
made a very complete and comprehensive summary called "A state of network
acceleration" that describes how things were before this infrastructure was
mailined [3] and it also makes a rough summary of this work [4].

[1] https://lwn.net/Articles/738214/
[2] https://lwn.net/Articles/742164/
[3] http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
[4] http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html
3 changes: 1 addition & 2 deletions include/linux/netfilter/nfnetlink_acct.h
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,5 @@ struct nf_acct;
struct nf_acct *nfnl_acct_find_get(struct net *net, const char *filter_name);
void nfnl_acct_put(struct nf_acct *acct);
void nfnl_acct_update(const struct sk_buff *skb, struct nf_acct *nfacct);
int nfnl_acct_overquota(struct net *net, const struct sk_buff *skb,
struct nf_acct *nfacct);
int nfnl_acct_overquota(struct net *net, struct nf_acct *nfacct);
#endif /* _NFNL_ACCT_H */
5 changes: 4 additions & 1 deletion include/linux/netfilter/x_tables.h
Original file line number Diff line number Diff line change
Expand Up @@ -281,6 +281,8 @@ int xt_check_entry_offsets(const void *base, const char *elems,
unsigned int target_offset,
unsigned int next_offset);

int xt_check_table_hooks(const struct xt_table_info *info, unsigned int valid_hooks);

unsigned int *xt_alloc_entry_offsets(unsigned int size);
bool xt_find_jump_offset(const unsigned int *offsets,
unsigned int target, unsigned int size);
Expand All @@ -301,6 +303,7 @@ int xt_data_to_user(void __user *dst, const void *src,

void *xt_copy_counters_from_user(const void __user *user, unsigned int len,
struct xt_counters_info *info, bool compat);
struct xt_counters *xt_counters_alloc(unsigned int counters);

struct xt_table *xt_register_table(struct net *net,
const struct xt_table *table,
Expand Down Expand Up @@ -509,7 +512,7 @@ void xt_compat_unlock(u_int8_t af);

int xt_compat_add_offset(u_int8_t af, unsigned int offset, int delta);
void xt_compat_flush_offsets(u_int8_t af);
void xt_compat_init_offsets(u_int8_t af, unsigned int number);
int xt_compat_init_offsets(u8 af, unsigned int number);
int xt_compat_calc_jump(u_int8_t af, unsigned int offset);

int xt_compat_match_offset(const struct xt_match *match);
Expand Down
1 change: 0 additions & 1 deletion include/net/netfilter/nf_conntrack_count.h
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ void nf_conncount_destroy(struct net *net, unsigned int family,
unsigned int nf_conncount_count(struct net *net,
struct nf_conncount_data *data,
const u32 *key,
unsigned int family,
const struct nf_conntrack_tuple *tuple,
const struct nf_conntrack_zone *zone);
#endif
3 changes: 1 addition & 2 deletions include/net/netfilter/nf_conntrack_helper.h
Original file line number Diff line number Diff line change
Expand Up @@ -132,8 +132,7 @@ void nf_conntrack_helper_pernet_fini(struct net *net);
int nf_conntrack_helper_init(void);
void nf_conntrack_helper_fini(void);

int nf_conntrack_broadcast_help(struct sk_buff *skb, unsigned int protoff,
struct nf_conn *ct,
int nf_conntrack_broadcast_help(struct sk_buff *skb, struct nf_conn *ct,
enum ip_conntrack_info ctinfo,
unsigned int timeout);

Expand Down
33 changes: 20 additions & 13 deletions include/net/netfilter/nf_tables.h
Original file line number Diff line number Diff line change
Expand Up @@ -434,11 +434,11 @@ static inline struct nft_set *nft_set_container_of(const void *priv)
return (void *)priv - offsetof(struct nft_set, data);
}

struct nft_set *nft_set_lookup(const struct net *net,
const struct nft_table *table,
const struct nlattr *nla_set_name,
const struct nlattr *nla_set_id,
u8 genmask);
struct nft_set *nft_set_lookup_global(const struct net *net,
const struct nft_table *table,
const struct nlattr *nla_set_name,
const struct nlattr *nla_set_id,
u8 genmask);

static inline unsigned long nft_set_gc_interval(const struct nft_set *set)
{
Expand Down Expand Up @@ -868,34 +868,38 @@ struct nft_chain {
char *name;
};

enum nft_chain_type {
enum nft_chain_types {
NFT_CHAIN_T_DEFAULT = 0,
NFT_CHAIN_T_ROUTE,
NFT_CHAIN_T_NAT,
NFT_CHAIN_T_MAX
};

/**
* struct nf_chain_type - nf_tables chain type info
* struct nft_chain_type - nf_tables chain type info
*
* @name: name of the type
* @type: numeric identifier
* @family: address family
* @owner: module owner
* @hook_mask: mask of valid hooks
* @hooks: array of hook functions
* @init: chain initialization function
* @free: chain release function
*/
struct nf_chain_type {
struct nft_chain_type {
const char *name;
enum nft_chain_type type;
enum nft_chain_types type;
int family;
struct module *owner;
unsigned int hook_mask;
nf_hookfn *hooks[NF_MAX_HOOKS];
int (*init)(struct nft_ctx *ctx);
void (*free)(struct nft_ctx *ctx);
};

int nft_chain_validate_dependency(const struct nft_chain *chain,
enum nft_chain_type type);
enum nft_chain_types type);
int nft_chain_validate_hooks(const struct nft_chain *chain,
unsigned int hook_flags);

Expand All @@ -917,7 +921,7 @@ struct nft_stats {
*/
struct nft_base_chain {
struct nf_hook_ops ops;
const struct nf_chain_type *type;
const struct nft_chain_type *type;
u8 policy;
u8 flags;
struct nft_stats __percpu *stats;
Expand Down Expand Up @@ -970,8 +974,8 @@ struct nft_table {
char *name;
};

int nft_register_chain_type(const struct nf_chain_type *);
void nft_unregister_chain_type(const struct nf_chain_type *);
void nft_register_chain_type(const struct nft_chain_type *);
void nft_unregister_chain_type(const struct nft_chain_type *);

int nft_register_expr(struct nft_expr_type *);
void nft_unregister_expr(struct nft_expr_type *);
Expand Down Expand Up @@ -1345,4 +1349,7 @@ struct nft_trans_flowtable {
#define nft_trans_flowtable(trans) \
(((struct nft_trans_flowtable *)trans->data)->flowtable)

int __init nft_chain_filter_init(void);
void __exit nft_chain_filter_fini(void);

#endif /* _NET_NF_TABLES_H */
4 changes: 2 additions & 2 deletions include/net/netfilter/xt_rateest.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ struct xt_rateest {
struct net_rate_estimator __rcu *rate_est;
};

struct xt_rateest *xt_rateest_lookup(const char *name);
void xt_rateest_put(struct xt_rateest *est);
struct xt_rateest *xt_rateest_lookup(struct net *net, const char *name);
void xt_rateest_put(struct net *net, struct xt_rateest *est);

#endif /* _XT_RATEEST_H */
1 change: 1 addition & 0 deletions include/uapi/linux/netfilter/nf_conntrack_common.h
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ enum ip_conntrack_events {
IPCT_NATSEQADJ = IPCT_SEQADJ,
IPCT_SECMARK, /* new security mark has been set */
IPCT_LABEL, /* new connlabel has been set */
IPCT_SYNPROXY, /* synproxy has been set */
#ifdef __KERNEL__
__IPCT_MAX
#endif
Expand Down
12 changes: 10 additions & 2 deletions include/uapi/linux/netfilter/nf_tables.h
Original file line number Diff line number Diff line change
Expand Up @@ -909,8 +909,8 @@ enum nft_rt_attributes {
* @NFT_CT_EXPIRATION: relative conntrack expiration time in ms
* @NFT_CT_HELPER: connection tracking helper assigned to conntrack
* @NFT_CT_L3PROTOCOL: conntrack layer 3 protocol
* @NFT_CT_SRC: conntrack layer 3 protocol source (IPv4/IPv6 address)
* @NFT_CT_DST: conntrack layer 3 protocol destination (IPv4/IPv6 address)
* @NFT_CT_SRC: conntrack layer 3 protocol source (IPv4/IPv6 address, deprecated)
* @NFT_CT_DST: conntrack layer 3 protocol destination (IPv4/IPv6 address, deprecated)
* @NFT_CT_PROTOCOL: conntrack layer 4 protocol
* @NFT_CT_PROTO_SRC: conntrack layer 4 protocol source
* @NFT_CT_PROTO_DST: conntrack layer 4 protocol destination
Expand All @@ -920,6 +920,10 @@ enum nft_rt_attributes {
* @NFT_CT_AVGPKT: conntrack average bytes per packet
* @NFT_CT_ZONE: conntrack zone
* @NFT_CT_EVENTMASK: ctnetlink events to be generated for this conntrack
* @NFT_CT_SRC_IP: conntrack layer 3 protocol source (IPv4 address)
* @NFT_CT_DST_IP: conntrack layer 3 protocol destination (IPv4 address)
* @NFT_CT_SRC_IP6: conntrack layer 3 protocol source (IPv6 address)
* @NFT_CT_DST_IP6: conntrack layer 3 protocol destination (IPv6 address)
*/
enum nft_ct_keys {
NFT_CT_STATE,
Expand All @@ -941,6 +945,10 @@ enum nft_ct_keys {
NFT_CT_AVGPKT,
NFT_CT_ZONE,
NFT_CT_EVENTMASK,
NFT_CT_SRC_IP,
NFT_CT_DST_IP,
NFT_CT_SRC_IP6,
NFT_CT_DST_IP6,
};

/**
Expand Down
10 changes: 10 additions & 0 deletions include/uapi/linux/netfilter/nfnetlink_conntrack.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@ enum ctattr_type {
CTA_MARK_MASK,
CTA_LABELS,
CTA_LABELS_MASK,
CTA_SYNPROXY,
__CTA_MAX
};
#define CTA_MAX (__CTA_MAX - 1)
Expand Down Expand Up @@ -190,6 +191,15 @@ enum ctattr_natseq {
};
#define CTA_NAT_SEQ_MAX (__CTA_NAT_SEQ_MAX - 1)

enum ctattr_synproxy {
CTA_SYNPROXY_UNSPEC,
CTA_SYNPROXY_ISN,
CTA_SYNPROXY_ITS,
CTA_SYNPROXY_TSOFF,
__CTA_SYNPROXY_MAX,
};
#define CTA_SYNPROXY_MAX (__CTA_SYNPROXY_MAX - 1)

enum ctattr_expect {
CTA_EXPECT_UNSPEC,
CTA_EXPECT_MASTER,
Expand Down
10 changes: 10 additions & 0 deletions include/uapi/linux/netfilter/xt_connmark.h
Original file line number Diff line number Diff line change
Expand Up @@ -19,11 +19,21 @@ enum {
XT_CONNMARK_RESTORE
};

enum {
D_SHIFT_LEFT = 0,
D_SHIFT_RIGHT,
};

struct xt_connmark_tginfo1 {
__u32 ctmark, ctmask, nfmask;
__u8 mode;
};

struct xt_connmark_tginfo2 {
__u32 ctmark, ctmask, nfmask;
__u8 shift_dir, shift_bits, mode;
};

struct xt_connmark_mtinfo1 {
__u32 mark, mask;
__u8 invert;
Expand Down
15 changes: 12 additions & 3 deletions include/uapi/linux/netfilter_bridge/ebt_ip.h
Original file line number Diff line number Diff line change
Expand Up @@ -24,8 +24,10 @@
#define EBT_IP_PROTO 0x08
#define EBT_IP_SPORT 0x10
#define EBT_IP_DPORT 0x20
#define EBT_IP_ICMP 0x40
#define EBT_IP_IGMP 0x80
#define EBT_IP_MASK (EBT_IP_SOURCE | EBT_IP_DEST | EBT_IP_TOS | EBT_IP_PROTO |\
EBT_IP_SPORT | EBT_IP_DPORT )
EBT_IP_SPORT | EBT_IP_DPORT | EBT_IP_ICMP | EBT_IP_IGMP)
#define EBT_IP_MATCH "ip"

/* the same values are used for the invflags */
Expand All @@ -38,8 +40,15 @@ struct ebt_ip_info {
__u8 protocol;
__u8 bitmask;
__u8 invflags;
__u16 sport[2];
__u16 dport[2];
union {
__u16 sport[2];
__u8 icmp_type[2];
__u8 igmp_type[2];
};
union {
__u16 dport[2];
__u8 icmp_code[2];
};
};

#endif
Loading

0 comments on commit d162190

Please sign in to comment.