-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NE-367: Add logLevel and operatorLogLevel APIs for DNS #931
Conversation
fbfa319
to
f6e231e
Compare
@Miciah PTAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some comments regarding markup issues, copy edits, and points of clarification, but overall this looks great.
Supporting a trivial way to raise the verbosity of the DNS Operator and it's Operands (CoreDNS) would make debugging | ||
cluster dns operator and CoreDNS issues easier for cluster administrators and OpenShift developers. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Supporting a trivial way to raise the verbosity of the DNS Operator and it's Operands (CoreDNS) would make debugging | |
cluster dns operator and CoreDNS issues easier for cluster administrators and OpenShift developers. | |
Supporting a trivial way to raise the verbosity of the DNS Operator and its Operands (CoreDNS) would make debugging | |
the Operator and CoreDNS issues easier for cluster administrators and OpenShift developers. |
Supporting a trivial way to raise the verbosity of the DNS Operator and it's Operands (CoreDNS) would make debugging | ||
cluster dns operator and CoreDNS issues easier for cluster administrators and OpenShift developers. | ||
|
||
To have the CoreDNS enable the following classes such as error, denial and all. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn't a complete sentence.
To have the CoreDNS enable the following classes such as error, denial and all. | |
For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all. |
To have the CoreDNS enable the following classes such as error, denial and all. | ||
denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR. | ||
error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. | ||
all: the default - nothing is specified. Using of this class means that all messages will be logged whatever we mix together with "all". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by "nothing is specified"? I'm not sure what you mean by "whatever we mix together with 'all'"; do you mean that "all" takes precedence when CoreDNS is configured to log a list of classes that includes "all"? Is that important to mention here? Remember, this is the motivation section; implementation details belong elsewhere.
all: the default - nothing is specified. Using of this class means that all messages will be logged whatever we mix together with "all". | |
all: all responses, including successful responses, errors, and denials. |
desire more in-depth logging statements when working on the operator's controllers. | ||
|
||
Additionally, a logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | ||
over the output of their DNSController's CoreDNS logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
over the output of their DNSController's CoreDNS logs. | |
over CoreDNS logs. |
|
||
### Non-Goals | ||
|
||
* Change the default logging verbosity of the DNS Operator or the CoreDNS in production OCP clusters. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Change the default logging verbosity of the DNS Operator or the CoreDNS in production OCP clusters. | |
* Change the default logging verbosity of the DNS Operator or CoreDNS in production OCP clusters. |
|
||
### Risks and Mitigations | ||
|
||
Raising the logging verbosity for any component typically results in larger log files that grow quickly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If possible, each risk should have a mitigation. For example, the godoc for the new API field could include a warning that setting logLevel: Trace
will produce extremely verbose logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Miciah Do you mean add a comment in the API PR and recreate the go docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean the point of having a "Risks and Mitigations" section is that you'll describe the mitigation for each risk.
### Upgrade / Downgrade Strategy | ||
|
||
On downgrade, any logging options are ignored by the DNS Operator and CoreDNS. | ||
A harmless logging level configmap in the `openshift-dns` namespace may be left behind. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would there be an extra configmap?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I need to remove this. the downgraded operator will update the configmap and delete the log stanzas.
Also, a logging level API for the DNS Operator would assist OpenShift developers working on the DNS Operator who may | ||
desire more in-depth logging statements when working on the operator's controllers. | ||
|
||
Additionally, a logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | ||
over the output of their DNSController's CoreDNS logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section would flow better if you swapped these two paragraphs (adjusting transitions as necessary).
|
||
### Goals | ||
|
||
* Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator) | |
* Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator). |
|
||
Valid values for logLevel are: "Normal", "Debug", "Trace", "TraceAll" as per [Operator Api](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) | ||
|
||
We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the logLevel we have defined in openshift api. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the logLevel we have defined in openshift api. | |
We will enable logging of CoreDNS's [classes of responses](https://github.com/coredns/coredns/tree/master/plugin/log#syntax) that correspond to the log level specified in the API. |
// See LogLevel for more information about each available logging level. | ||
// | ||
// +optional | ||
OperatorLogLevel LogLevel `json:"operatorLogLevel"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add defaulting to the standard loglevel.
// logLevel describes the logging verbosity of the DNSController for CoreDNS. | ||
// | ||
// +optional | ||
LogLevel LogLevel `json:"logLevel"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add defaulting to the standard loglevel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry @sttts forgot to update here as had added it the api openshift/api#1031
@sttts just a request if would be able to review openshift/api#1031 then that will be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sttts PTAL just added the changes you requested. Thanks in advance!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented on the api
095c433
to
4d713d6
Compare
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
@Miciah @alebedev87 @arjunrn PTAL |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
/reopen |
@miheer: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all. | ||
denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR. | ||
error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. | ||
all: all responses, including successful responses, errors, and denials. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please fix the formatting.
For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all. | |
denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR. | |
error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. | |
all: all responses, including successful responses, errors, and denials. | |
For logging purposes, CoreDNS defines several classes of responses: | |
* denial: NXDOMAIN and nodata responses (name exists, record type does not). A nodata response sets the return code to NOERROR. | |
* error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. | |
* all: all responses, including successful responses, errors, and denials. |
## Proposal | ||
|
||
### DNS Operator Log Level API | ||
We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with `DNSLogLevel` type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the emphasis? You ended up defining a new type, right?
We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with `DNSLogLevel` type. | |
We will be defining a new API field `operatorLogLevel` in `DNSSpec` with newly defined type `DNSLogLevel`. | |
This type is similar to the existing `LogLevel` type except that the values of `DNSLogLevel` are a subset of the values of `LogLevel`. |
|
||
### DNS Operator Log Level API | ||
We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with `DNSLogLevel` type. | ||
Valid values will be : "Normal", "Debug", "Trace". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid values will be : "Normal", "Debug", "Trace". | |
Valid values will be the following: "Normal", "Debug", "Trace". |
The CoreDNS reloads its configuration without requiring a restart, so the operator can adjust CoreDNS's log level just by updating the Corefile configmap without need to restart the pod. | ||
|
||
|
||
We will be adding an API field `operatorlogLevel` in DNSSpec for the type DNSLogLevel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We will be adding an API field `operatorlogLevel` in DNSSpec for the type DNSLogLevel | |
We will be adding an API field `operatorlogLevel` in `DNSSpec` with the type `DNSLogLevel`: |
``` | ||
This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator. | ||
|
||
Additionally, a new `LogLevel` of type `DNSLogLevel` will be added for CoredDNS logging : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additionally, a new `LogLevel` of type `DNSLogLevel` will be added for CoredDNS logging : | |
Additionally, a new API field `LogLevel` of type `DNSLogLevel` will be added to specify the log level for CoreDNS: |
Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response. | ||
So we would to enable the log plugin for CoreDNS to log queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response. | |
So we would to enable the log plugin for CoreDNS to log queries. | |
Adding this Prometheus alert is useful, but it would be more useful if we could see which requests were getting SERVFAIL responses. | |
So we would like to configure the log plugin for CoreDNS to log queries. |
* Some users recently added the new alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP. | ||
Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response. | ||
So we would to enable the log plugin for CoreDNS to log queries. | ||
|
||
* Some user want to avoid use of tcpdump to see the queries and want log plugin to be enabled to log queries in coredns. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe my previous comment here still applies.
|
||
### Risks and Mitigations | ||
|
||
Raising the logging verbosity for any component typically results in larger log files that grow quickly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I mean the point of having a "Risks and Mitigations" section is that you'll describe the mitigation for each risk.
* Don't provide any DNS logging level APIs for the operator and coredns (current behavior) | ||
* Raise current verbosity of the DNS Operator and coredns (not desirable) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A third alternative is tcpdump.
Normally we'd want a little more discussion as to why the alternative was not considered used.
### API Extensions | ||
|
||
### Operational Aspects of API Extensions | ||
|
||
#### Failure Modes | ||
|
||
#### Support Procedures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be filled out.
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
Rotten enhancement proposals close after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Reopen the proposal by commenting /close |
@openshift-bot: Closed this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/remove-lifecycle rotten |
/reopen |
@miheer: Reopened this PR. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! We can address remaining issues in a follow-up.
/lgtm
* all: all responses, including successful responses, errors, and denials. | ||
A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | ||
over CoreDNS logs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a linebreak to prevent lines 53-54 from being formatted as part of the bullet on line 52.
* all: all responses, including successful responses, errors, and denials. | |
A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | |
over CoreDNS logs. | |
* all: all responses, including successful responses, errors, and denials. | |
A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | |
over CoreDNS logs. |
* Some users want to add new prometheus alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP. | ||
Adding this Prometheus alert is useful, but it would be more useful if we could see which requests were getting SERVFAIL responses. | ||
So we would like to configure the log plugin for CoreDNS to log queries. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still confused as to what the use-case is here, or how this relates to the existing alert: https://github.com/openshift/cluster-dns-operator/blob/531e38a2f640bfbefb7126d6e701afad8db6f911/manifests/0000_90_dns-operator_03_prometheusrules.yaml#L32-L43
``` | ||
After the logging level on a Logger is set, log entries with that severity or anything above it will be logged. | ||
For example, `log.SetLevel(log.InfoLevel)` will log anything that is info or above (warn, error, fatal, panic). This is the default log level. | ||
So, we will be reading `operatorLogLevel` in a separate controller to watch dnses and setting log level. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't end up using a separate controller, so this text should be updated when you get the chance.
```go | ||
// operatorLogLevel controls the logging level of the DNS Operator. | ||
// Valid values are: "Normal", "Debug", "Trace". | ||
// Defaults to "Normal". | ||
// setting operatorLogLevel: Trace will produce extremely verbose logs. | ||
// +optional | ||
// +kubebuilder:default=Normal | ||
OperatorLogLevel DNSLogLevel `json:"operatorLogLevel,omitempty"` | ||
``` | ||
This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator. | ||
|
||
Additionally, a new API field `LogLevel` of type `DNSLogLevel` will be added to specify the log level for CoreDNS: | ||
```go | ||
// logLevel describes the desired logging verbosity for CoreDNS. | ||
// Any one of the following values may be specified: | ||
// * Normal logs errors from upstream resolvers. | ||
// * Debug logs errors, NXDOMAIN responses, and NODATA responses. | ||
// * Trace logs errors and all responses. | ||
// Setting logLevel: Trace will produce extremely verbose logs. | ||
// Valid values are: "Normal", "Debug", "Trace". | ||
// Defaults to "Normal". | ||
// +optional | ||
// +kubebuilder:default=Normal | ||
LogLevel DNSLogLevel `json:"logLevel,omitempty"` | ||
``` | ||
|
||
Both of these new API fields use the aforementioned `DNSLogLevel` type, which is defined as follows: | ||
```go | ||
|
||
// +kubebuilder:validation:Enum:=Normal;Debug;Trace | ||
type DNSLogLevel string | ||
|
||
var ( | ||
// Normal is the default. Normal, working log information, everything is fine, but helpful notices for auditing or common operations. In kube, this is probably glog=2. | ||
DNSLogLevelNormal DNSLogLevel = "Normal" | ||
|
||
// Debug is used when something went wrong. Even common operations may be logged, and less helpful but more quantity of notices. In kube, this is probably glog=4. | ||
DNSLogLevelDebug DNSLogLevel = "Debug" | ||
|
||
// Trace is used when something went really badly and even more verbose logs are needed. Logging every function call as part of a common operation, to tracing execution of a query. In kube, this is probably glog=6. | ||
DNSLogLevelTrace DNSLogLevel = "Trace" | ||
) | ||
|
||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definitions we ended up with in openshift/api#1031 look a bit different; the enhancement text should be updated eventually.
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Miciah The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@miheer: all tests passed! Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
Add logLevel and operatorLogLevel APIs for DNS https://issues.redhat.com/browse/NE-367