dns: Add configurable-dns-pod-placement enhancement #663

Miciah · 2021-02-23T10:54:39Z

This enhancement enables cluster administrators to configure the placement of the CoreDNS Pods that provide cluster DNS service.

sgreene570 · 2021-02-23T19:53:04Z

enhancements/dns/configurable-dns-pod-placement.md

+
+	// nodePlacement enables explicit control over the scheduling of DNS pods.
+	//
+	// If unset, defaults are used. See nodePlacement for more details.


Suggested change

// If unset, defaults are used. See nodePlacement for more details.

// If unset, defaults are used. See dnsNodePlacement for more details.

Is this what you meant?

I think nodePlacement is right; the godoc should reference the field by its name, which is nodePlacement.

Would it make sense to say "refer to x for more info on x" though?
Maybe i'm overthinking this. https://github.com/openshift/api/blob/master/operator/v1/types_ingress.go#L134 uses the type rather than field name (ah well actually thats ambiguous!)

Either way works for me. Not worth bike shedding over this.

sgreene570 · 2021-03-05T16:07:30Z

enhancements/dns/configurable-dns-pod-placement.md

+By default, DNS Pods run on untainted Linux nodes.  The `NodePlacement` field
+enables cluster administrators to specify alternative parameters.  For example,
+the following DNS specifies that DNS Pods should run only on "infra" nodes
+(i.e., nodes that have the "node-role.kubernetes.io/infra" label):


Might be worth mentioning that this example assumes infra nodes have no infra node taints.

Thanks! I amended the example to include a toleration for any "node-role.kubernetes.io/infra" NoSchedule taint. I don't know how prevalent the use of this taint is, but I gather that it is something that cluster administrators who configure infra nodes often use, and adding the toleration shouldn't cause any problems if the taint doesn't exist, so the example might as well include it.

sgreene570 · 2021-03-29T19:36:34Z

/retest
/lgtm
/hold for others to review

smarterclayton · 2021-03-30T05:32:07Z

enhancements/dns/configurable-dns-pod-placement.md

+
+### User Stories
+
+#### As a cluster administrator, I must comply with a security policy that prohibits communication among worker nodes


Why is solving this use case with explicit placement more appropriate than using service topology? This is a pretty specific constraint (and doesn’t cover other node to node traffic like monitoring, sdn, control plane traffic, etc). While constraining DNS to nodes is reasonable on its own right for administrator topology choices, this specific example is fairly vague / not particularly complete.

Service topology is alpha and deprecated, to be superseded by internal traffic policy, which is alpha (and not yet implemented AFAIK). Users have been strenuously complaining about this for a while—some security policies require prohibiting traffic among worker nodes, so DNS needs to run on non-worker nodes only.

smarterclayton · 2021-03-30T05:38:52Z

enhancements/dns/configurable-dns-pod-placement.md

+such a way that DNS Pods could not be scheduled to any node, rendering the DNS
+service unavailable.  Because the DNS service is critical to other cluster
+components including OAuth, fixing misconfigured DNS Pod placement parameters
+could be impossible for the cluster administrator to do.


Is this only scenario that could block fixing this the user being unable to login? I’m skeptical that a one time check for placement is the right approach, vs (for example) always ensuring dns runs on the control plane nodes if there are no valid worker nodes in use (or separately as a fallback). Emulating the scheduler is probably not a great idea (vs detecting zero pods at runtime or always running the previous topology until the new topology is in place, or “failing open” if no pods are detected). Generally doing any sort of static detection of schedulability adds fragility to a system.

Yeah, if DNS isn't available to resolve kubernetes.default.svc.cluster.local, then presumably that could break OAuth. If the administrator didn't already have an OAuth token handy, the problem might then be impossible to resolve. If the user did have an OAuth token, it should be possible to patch the CRD to revert the problematic node-placement parameters. I agree that trying to infer what the scheduler will do is an imperfect mitigation. We could add some logic to the reconcile loop to detect if the operator has been reporting degraded for >n minutes and has 0 DNS pods; would it be valid for the operator to revert spec.nodePlacement, or does the operator need to maintain some state to track that the current value in spec.nodePlacement resulted in having 0 pods and needs to be ignored?

knobunc · 2021-04-09T15:25:36Z

/approve

openshift-ci-robot · 2021-04-09T15:25:49Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: knobunc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [knobunc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

knobunc · 2021-04-09T15:37:29Z

/lgtm

knobunc · 2021-04-09T18:14:58Z

/hold cancel

We can address any other concerns in follow-up PRs.

openshift-ci-robot requested review from hardys and jwmatthews February 23, 2021 10:54

Miciah force-pushed the dns-add-configurable-dns-pod-placement-enhancement branch from e2418d0 to 1dea562 Compare February 23, 2021 18:56

sgreene570 reviewed Feb 23, 2021

View reviewed changes

sgreene570 reviewed Mar 5, 2021

View reviewed changes

Miciah force-pushed the dns-add-configurable-dns-pod-placement-enhancement branch 2 times, most recently from 922c81f to 76c4ff3 Compare March 22, 2021 02:33

Miciah force-pushed the dns-add-configurable-dns-pod-placement-enhancement branch from 76c4ff3 to 8976fdf Compare March 29, 2021 18:01

dns: Add configurable-dns-pod-placement enhancement

d24f542

Miciah force-pushed the dns-add-configurable-dns-pod-placement-enhancement branch from 8976fdf to d24f542 Compare March 29, 2021 18:06

openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 29, 2021

openshift-ci-robot assigned sgreene570 Mar 29, 2021

openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Mar 29, 2021

smarterclayton reviewed Mar 30, 2021

View reviewed changes

openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 9, 2021

openshift-ci-robot assigned knobunc Apr 9, 2021

openshift-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Apr 9, 2021

openshift-merge-robot merged commit b2dc667 into openshift:master Apr 9, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dns: Add configurable-dns-pod-placement enhancement #663

dns: Add configurable-dns-pod-placement enhancement #663

Miciah commented Feb 23, 2021 •

edited

Loading

sgreene570 Feb 23, 2021

Miciah Feb 23, 2021

sgreene570 Feb 23, 2021 •

edited

Loading

sgreene570 Feb 23, 2021

sgreene570 Mar 5, 2021

Miciah Mar 22, 2021

sgreene570 commented Mar 29, 2021

smarterclayton Mar 30, 2021

Miciah Mar 30, 2021

smarterclayton Mar 30, 2021

Miciah Mar 30, 2021

knobunc commented Apr 9, 2021

openshift-ci-robot commented Apr 9, 2021

knobunc commented Apr 9, 2021

knobunc commented Apr 9, 2021

	// If unset, defaults are used. See nodePlacement for more details.
	// If unset, defaults are used. See dnsNodePlacement for more details.


		### User Stories

		#### As a cluster administrator, I must comply with a security policy that prohibits communication among worker nodes

dns: Add configurable-dns-pod-placement enhancement #663

dns: Add configurable-dns-pod-placement enhancement #663

Conversation

Miciah commented Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgreene570 Feb 23, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sgreene570 commented Mar 29, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

knobunc commented Apr 9, 2021

openshift-ci-robot commented Apr 9, 2021

knobunc commented Apr 9, 2021

knobunc commented Apr 9, 2021

Miciah commented Feb 23, 2021 •

edited

Loading

sgreene570 Feb 23, 2021 •

edited

Loading