-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dns: Add configurable-dns-pod-placement enhancement #663
dns: Add configurable-dns-pod-placement enhancement #663
Conversation
e2418d0
to
1dea562
Compare
|
||
// nodePlacement enables explicit control over the scheduling of DNS pods. | ||
// | ||
// If unset, defaults are used. See nodePlacement for more details. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// If unset, defaults are used. See nodePlacement for more details. | |
// If unset, defaults are used. See dnsNodePlacement for more details. |
Is this what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think nodePlacement
is right; the godoc should reference the field by its name, which is nodePlacement
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it make sense to say "refer to x for more info on x" though?
Maybe i'm overthinking this. https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L134 uses the type rather than field name (ah well actually thats ambiguous!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Either way works for me. Not worth bike shedding over this.
By default, DNS Pods run on untainted Linux nodes. The `NodePlacement` field | ||
enables cluster administrators to specify alternative parameters. For example, | ||
the following DNS specifies that DNS Pods should run only on "infra" nodes | ||
(i.e., nodes that have the "node-role.kubernetes.io/infra" label): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be worth mentioning that this example assumes infra nodes have no infra node taints.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I amended the example to include a toleration for any "node-role.kubernetes.io/infra" NoSchedule
taint. I don't know how prevalent the use of this taint is, but I gather that it is something that cluster administrators who configure infra nodes often use, and adding the toleration shouldn't cause any problems if the taint doesn't exist, so the example might as well include it.
922c81f
to
76c4ff3
Compare
76c4ff3
to
8976fdf
Compare
8976fdf
to
d24f542
Compare
/retest |
|
||
### User Stories | ||
|
||
#### As a cluster administrator, I must comply with a security policy that prohibits communication among worker nodes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is solving this use case with explicit placement more appropriate than using service topology? This is a pretty specific constraint (and doesn’t cover other node to node traffic like monitoring, sdn, control plane traffic, etc). While constraining DNS to nodes is reasonable on its own right for administrator topology choices, this specific example is fairly vague / not particularly complete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Service topology is alpha and deprecated, to be superseded by internal traffic policy, which is alpha (and not yet implemented AFAIK). Users have been strenuously complaining about this for a while—some security policies require prohibiting traffic among worker nodes, so DNS needs to run on non-worker nodes only.
such a way that DNS Pods could not be scheduled to any node, rendering the DNS | ||
service unavailable. Because the DNS service is critical to other cluster | ||
components including OAuth, fixing misconfigured DNS Pod placement parameters | ||
could be impossible for the cluster administrator to do. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this only scenario that could block fixing this the user being unable to login? I’m skeptical that a one time check for placement is the right approach, vs (for example) always ensuring dns runs on the control plane nodes if there are no valid worker nodes in use (or separately as a fallback). Emulating the scheduler is probably not a great idea (vs detecting zero pods at runtime or always running the previous topology until the new topology is in place, or “failing open” if no pods are detected). Generally doing any sort of static detection of schedulability adds fragility to a system.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, if DNS isn't available to resolve kubernetes.default.svc.cluster.local
, then presumably that could break OAuth. If the administrator didn't already have an OAuth token handy, the problem might then be impossible to resolve. If the user did have an OAuth token, it should be possible to patch the CRD to revert the problematic node-placement parameters. I agree that trying to infer what the scheduler will do is an imperfect mitigation. We could add some logic to the reconcile loop to detect if the operator has been reporting degraded for >n minutes and has 0 DNS pods; would it be valid for the operator to revert spec.nodePlacement
, or does the operator need to maintain some state to track that the current value in spec.nodePlacement
resulted in having 0 pods and needs to be ignored?
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: knobunc The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
/hold cancel We can address any other concerns in follow-up PRs. |
This enhancement enables cluster administrators to configure the placement of the CoreDNS Pods that provide cluster DNS service.