-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for interpreting NetworkPolicies #36
Add support for interpreting NetworkPolicies #36
Conversation
Open questionsPolicy mergingFrom the documentation:
So I came up with the following design:
Under the assumption that no two nets of the same size can overlap (without being identical), this should result in the expected behavior of the union of all policies. Did I miss anything? ct mark & maskPreviously we had a mark and a mask in nftables. Both were Matching policies to port forwardsThe current design is:
|
I'm not sure that correctly implements the By extension of the logic of if you have a policy which allows all ranges, you cannot lock it down further, the correct implementation would allow everything in that scenario, which I don't think your proposal does. I think to correctly implement this in nftables, you'd need a chain for each
You would generate the following ruleset:
This effectively allows 128.0.0.0/2 (including all of 192.168.0.0/16!), 0.0.0.0/1, and 172.1.0.0/16, except 172.1.1.0/24 (which is not included by 128.0.0.0/2 or 0.0.0.0/1). This is all inferred from the implied statement that "if you allow 0.0.0.0/0, it cannot be locked down further".
The purpose of the mark is to have a simpler and more accurate rule in the forward chain, similar to your new use. As we only ever had one rule which was needed in the forward chain, we only needed one value. Without the mark, we would have to allow arbitrary traffic from the internet into the cluster network, because we have no way of detecting valid DNAT traffic, and the forward chain happens after the DNAT translation. In short: because the forward chain sees the translated addresses, we would have required a blanked Your new use seems adequate and sensible, and good enough for this iteration, even though it unnecessarily expands the ruleset a little (if one used the least significant bit to denote "this is a valid DNAT'd packet" and shifted the service index to the upper bits, the postrouting chain could be simplified to a single rule again). This obviously reserves all bits of the mark for ch-k8s-lbaas, which may or may not be a good thing (it probably isn't). AFAICT, we're not yet using any marks in yaook/k8s, but this definitely needs to be documented here and over there.
I don't think so.
Yes, definitely. It is entirely legitimate to have a network policy which locks down an entire namespace, without further refining by pod selectors, and then have multiple Services which expose things there to the subset of IPs which can still access the workload. The other way around (having a Service and only letting a subset of pods access it) is probably less sensical, but it's still possible. By the way, did you consider doing the source address filtering in the prerouting chain already? Doing it that way would allow you to map the scenarios above and resimplify the forward chain and the mark use. It would extend the prerouting chain significantly, but in the NAT table, that's AFAIK only run for untracked packets anyway, so it would likely be a better place to stuff complexity. |
Thanks for your feedback! So the new design considers the fact that "Pod is matched by the Service" and "Pod is matched by the NetworkPolicy" are orthogonal. It doesn't touch the current aggregation of the Forwards in the controller and neither prerouting nor postrouting chain in the agent. In short, the controller:
And the agent:
In the end of each policy chain, we must return because there could be a different policy that accepts the packet. That's why each packet that has ever seen a policy chain is marked with an additional bit and then dropped in the end. What is still missing now:
|
I was actually asking about the mask. We do |
0bb49c8
to
acf6059
Compare
The purpose is to allow it to co-exist with potential other use of marks, as we're injecting these rules into an existing framework of nftables rules. |
Could you share a generated nft file? That would help understanding the current end result. I think that the marks you added there are not necessary, but I'd have to see it in full. |
load-balancer-config:
ingress:
- address: 185.187.1.1
ports:
- protocol: TCP
inbound-port: 80
destination-addresses:
- 10.9.8.7
- 10.9.8.6
destination-port: 8080
policy-assignments:
- address: 10.9.8.7
network-policies:
- pol1
- address: 10.10.2.1
network-policies:
- pol1
- pol2
network-policies:
- name: pol1
ports:
- protocol: TCP
port: 80
- protocol: TCP
port: 8080
end-port: 8090
- protocol: UDP
allowed-ip-blocks:
- cidr: 185.187.0.0/16
except:
- 185.187.13.0/24
- cidr: 185.187.19.0/24
- name: pol2
allowed-ip-blocks:
- cidr: 185.187.19.0/24
except:
- 185.187.19.0/25
- cidr: 185.187.19.0/23
except:
- 185.187.19.100/32
- 185.187.19.102/32
- 185.187.19.109/32
- 185.187.19.110/32 will result in
If I didn't use the mark, then packets that were not accepted inside |
How about a scheme like this:
This has the following advantages:
The downside is that this won't interrupt already ongoing connections on application of a different network policy, though I'm not sure if this can be expected anyway. |
This approach does not account for the orthogonality of "Pod is matched by the Service" and "Pod is matched by the NetworkPolicy" by making the (reasonable?) assumption that policies apply to all pods of a service which contradicts your statement from above:
Because k8s does allow for a policy to only apply to one single pod of a service, implementing it like this would mean that the controller had to iterate over all pods of a service and then merge (and deduplicate) the policies before serving them to the agent. This would result in more complexity and in a behavior that is not consistent with how policies work in k8s (adding a policy to one pod of the service locks down the whole service). |
I cannot assess the impact on performance of these points:
Do you think the performance benefit of your proposal outweighs the added complexity and the nonconformity? |
That's a very good point, I had missed that. So we should stay with your rough layout. In any case, in your current approach you don't need to actually transfer the mark you set in the forward chain to the conntrack mark, because AIUI the mark will stick on the packet until it leaves the system. Hence, there's no need for moving it to conntrack, as the rules will be re-evaluated for each packet anyway. And I still wonder if you can get around the marking using |
ec71fa2
to
94b8846
Compare
Example NetworkPolicy -> nftables.conf apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: nginx
namespace: default
spec:
ingress:
- from:
- namespaceSelector:
matchLabels:
project: myproject
ports:
- port: 80
protocol: TCP
podSelector:
matchLabels:
app: nginx
policyTypes:
- Ingress
---
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-network-policy
namespace: default
spec:
egress:
- ports:
- port: 5978
protocol: TCP
to:
- ipBlock:
cidr: 10.0.0.0/24
ingress:
- from:
- ipBlock:
cidr: 172.17.0.0/16
except:
- 172.17.1.0/24
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
ports:
- port: 6379
protocol: TCP
podSelector:
matchLabels:
role: db
policyTypes:
- Ingress
- Egress
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I'm not mistaken, at least one unit test needs to be added (against 158c8cd) and there either needs to be some commenting or some refactoring (or both!)
5d732b1
to
9570ec8
Compare
This commit adds two new elements to the second hierarchy level of the model: NetworkPolicies and PolicyAssignments. Each NetworkPolicy entry can contain zero or more AllowedIngresses which in turn can contain zero or more IPBlockFilters and zero or more PortFilters. namespaceSelectors and podSelectors are not part of the model. PolicyAssignments contains a mapping from one podIP to zero or more NetworkPolicy names. The new entries in the model do not interfere with the existing Ingress element. This accounts for the fact that "Pod is matched by a Service" and "Pod is matched by a NetworkPolicy" are orthogonal.
4c32a41
to
be28618
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tested this in my cluster and it mostly works as expected. I have however found a way to make it fail spectacularly, which you should probably fix :-).
be28618
to
578998b
Compare
This commit adds support for converting NetworkPolicies into nftable rules: * For each NetworkPolicy entry in the model, a new chain is created in the inet filter table which contains jump rules to the corresponding IngressRule chains. * For each AllowedIngress inside each NetworkPolicy, a new ingressRuleChain is created. * Inside the IngressRuleChain, for each combination of PortFilter and IPBlockFilter, an 'accept' rule is created, except for those IPBlockFilters that contain a 'Block' list. Those jump to the corresponding chain (see below) instead. If there are no PortFilters, the rules only contain source addresses. If there are no IPBlockFilters, the rules only contain ports. If there are neither IPBlockFilters nor PortFilters, the rule breaks down to 'accept' everything. * For each IPBlockFilters that contains an 'Except' list, a new chain is created. Inside this chain, for each entry in the 'Block' list, a 'drop' rule is created. The chain's default verdict is 'accept'. The NetworkPolicy chains are not yet reachable at this point. This will be added in a separate commit.
For each podIP for which a PolicyAssignment exists in the model, * a new chain in the filter table with default 'drop' is created. For each NetworkPolicy assigned to this podIP, a jump rule is added that points to the chain of the respective NetworkPolicy. * a 'goto' rule is added to the forward chain that points to the podIP specific chain. The existing code to generate the nat table remains untouched. NetworkPolicies are handled after the DNAT because only then is the podIP known. This accounts for the fact that "Pod is matched by the Service" and "Pod is matched by the NetworkPolicy" are orthogonal.
This commit adds a python script to hack/debug-agent which can be used to debug a (locally running) agent by manually sending requests to it. The requests are read from request.yaml.
This commit adds support to the controller for creating NetworkPolicy and PolicyAssignment entries in the model. For each NetworkPolicy in the cluster which applies to Ingress (spec.PolicyTypes contains "Ingress"), a NetworkPolicy entry in the cluster is created. All IngressRules that contain an ipBlock are added to the NetworkPolicy entry as AllowedIngress. Policies that only contain namespaceSelectors or podSelectors result in a NetworkPolicy entry without AllowedIngress. Also, for each NetworkPolicy in the cluster which applies to Ingress, and for each pod to which it applies, an entry in PolicyAssignment is created.
578998b
to
3035522
Compare
This PR provides a working POC for interpreting NetworkPolicies when the Pod backend is used.
What needs to be fixed before I would consider this done:
Reminder: After lbaas forwarded the traffic into the cluster, the NetworkPolicy is applied again by the CNI, only this time the packets appear to originate from the gateway node. So make sure to always add your cluster's internal CIDR to the ingress ipBlocks as well.
Closes #34