-
Notifications
You must be signed in to change notification settings - Fork 49
Packet: improve bootstrap iptables rules #202
Conversation
9ca8bf1
to
549a850
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@invidian regarding the management CIDR, I'm not sure what to do. Both options (using it or leave port 22 open to the world) seem wrong, and the only reason we need to do that is because of bootkube :-/.
I'd move propose to do something safe: move forward with the other iptables changes (i.e. not the ones using management_cidr
) and I'd leave the management cidr changes for another PR.
What do you think?
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/cl/controller.yaml.tmpl
Outdated
Show resolved
Hide resolved
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/controllers.tf
Show resolved
Hide resolved
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/calico-host-protection.yaml.tmpl
Show resolved
Hide resolved
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/workers/workers.tf
Show resolved
Hide resolved
-A INPUT -p tcp --dport 10256 -j ACCEPT | ||
%{ for management_cidr in management_cidrs ~} | ||
-A INPUT -s ${management_cidr} -p tcp --dport 22 -j ACCEPT | ||
%{~ endfor } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think ssh is needed for workers before they are running calico (out of band console can be used if needed). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems reasonable, though opinionated, so it should be documented. If calico fails to start for some reason, it makes it more difficult to debug. What do you think @kinvolk/team-cloud?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, maybe port 22 should also be allowed from node_private_cidr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMO we want to be able to SSH into workers at all times, and especially during failures.
- I don't think we want to allow SSH from
node_private_cidr
. This increases the nodes' attack surface (very theoretically) but mainly I don't find it useful. My guess is you're suggesting this to allow using a bastion host (right?), in which case I would include the bastion's/32
address inmanagement_cidrs
because it's a management CIDR.
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/cl/controller.yaml.tmpl
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some inline comments.
Regarding management CIDRs:
Like @rata, I'm ambivalent regarding restricting "fallback rules" to the management CIDRs. Here is why:
These rules are "fallback rules" whose purpose is to allow the k8s control plane to function during bootstrap and failure scenarios without being completely exposed to the outside world. The policy which would be enforced most of the time is the one enforced by Calico.
Management CIDRs change frequently, and this is why I think changing them should entail a k8s API operation only. Putting management CIDRs in Ignition means one of the following:
- We have to replace all the nodes to update management CIDRs.
- We create a mismatch between the GNP-based policy and the fallback policy assuming CIDRs change over time.
While I do see value in not exposing SSH to the entire internet, I don't see a serious security threat there since brute-forcing private SSH keys is impractical. The main threat in having SSH without filtering sources AFAICT is a new zero day CVE. Given that the fallback policy should only apply during very limited time windows, I lean towards not filtering by management CIDRs.
One last thought: maybe we can achieve a state where the iptables rules injected by Calico stay in place when Calico dies: rather than falling back to the bootstrap policy, we "freeze" the Calico policy until Calico is converged again. Once Calico is converged, it's possible that the policy be updated by Calico if GNP objects were modified in the meantime. Have we checked that Calico-injected rules indeed disappear during failure scenarios?
assets/lokomotive-kubernetes/packet/flatcar-linux/kubernetes/calico-host-protection.yaml.tmpl
Show resolved
Hide resolved
-A INPUT -p tcp --dport 10256 -j ACCEPT | ||
%{ for management_cidr in management_cidrs ~} | ||
-A INPUT -s ${management_cidr} -p tcp --dport 22 -j ACCEPT | ||
%{~ endfor } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- IMO we want to be able to SSH into workers at all times, and especially during failures.
- I don't think we want to allow SSH from
node_private_cidr
. This increases the nodes' attack surface (very theoretically) but mainly I don't find it useful. My guess is you're suggesting this to allow using a bastion host (right?), in which case I would include the bastion's/32
address inmanagement_cidrs
because it's a management CIDR.
This commit changes the template method we use in Packet Terraform files where possible, from template_file data source coming from 3rd party Terraform provider to built-in 'templatefile' function, which is available from Terraform 0.12, as it provides the exact same functionality, but do not require downloading 3rd party provider. Also 'template' provider recommends using this function: https://www.terraform.io/docs/providers/template/d/file.html. Part of #196 Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
If there is no management_cidrs specified, the template produces empty line after 'managementCIDRs:' line, which shouldn't be there. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
549a850
to
009b169
Compare
Addressed all review feedback, please have a look again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. However, I'd add a comment saying why we use 10.0.0.0/8
as that is a trck not obvious for most readers not very involved with Packet networking internals or any new comer.
Other than the simple comment, LGTM :)
This commit adds extra filtering for cluster-internal ports, so even if we listen on all interfaces, those ports won't be accessible from the internet, but only from Packet private CIDR. Later on, Calico should further tighten those rules. SSH port 22 stays accesible from all addresses on purpose, to allow eventual debugging if provisioning fails. We use 10.0.0.0/8 as this is Packet private network CIDR. Signed-off-by: Mateusz Gozdek <mateusz@kinvolk.io>
009b169
to
257ed36
Compare
Make sense, done. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks again! :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This PR further locks down bootstrap iptables rules. The disadvantage though, which we need to consider is, that if user updates some of those CIDRs and rely on them, then Calico rules are flushed for whatever reason, the original rules will be applied (from node creation time), unless the node is replaced during the process. We need to make sure, that this is the behavior we want.
Alternatively, changing CIDRs would require node upgrade rollout we plan to implement.
Refs #137 #8