diff --git a/enhancements/dns/dns-operator-operand-logging-level.md b/enhancements/dns/dns-operator-operand-logging-level.md new file mode 100644 index 00000000000..63bb52566e4 --- /dev/null +++ b/enhancements/dns/dns-operator-operand-logging-level.md @@ -0,0 +1,228 @@ +--- +title: dns-operator-operand-logging-level +authors: + - "@miheer" +reviewers: + - "@alebedev87" + - "@candita" + - "@frobware" + - "@knobunc" + - "@Miciah" + - "@rfredette" +approvers: + - "@frobware" + - "@knobunc" + - "@Miciah" + - "@alebedev87" + - "@rfredette" + - "@candita" +creation-date: 2021-10-14 +last-updated: 2021-10-14 +status: implementable +--- + +# DNS Log Level API + +## Release Signoff Checklist + +- [X] Enhancement is `implementable` +- [ ] Design details are appropriately documented from clear requirements +- [ ] Test plan is defined +- [ ] Operational readiness criteria is defined +- [ ] Graduation criteria for dev preview, tech preview, GA +- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) + +## Summary + +This enhancement describes the API and code changes necessary to expose +a means to change the DNS Operator and CoreDNS Logging Levels to +cluster administrators. + +## Motivation + +Supporting a trivial way to raise the verbosity of the DNS Operator and its Operands (CoreDNS) would make debugging +the Operator and CoreDNS issues easier for cluster administrators and OpenShift developers. + +For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all. +denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR. +error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. +all: all responses, including successful responses, errors, and denials. +A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control +over CoreDNS logs. + +Also, a logging level API for the DNS Operator would assist OpenShift developers working on the DNS Operator who may +desire more in-depth logging statements when working on the operator's controllers. + + +### Goals + +Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator). +### Non-Goals + +* Change the default logging verbosity of the DNS Operator or CoreDNS in production OCP clusters. + +## Proposal + +### DNS Operator Log Level API +We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) type. +Valid values will be : "Normal", "Debug", "Trace". +We will using [logrus](https://github.com/sirupsen/logrus#level-logging) to enable the log levels for the operator level logging. +Logrus has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic. +log.Trace("Something very low level.") +log.Debug("Useful debugging information.") +log.Info("Something noteworthy happened!") +log.Warn("You should probably take a look at this.") +log.Error("Something failed but I'm not quitting.") // Calls os.Exit(1) after logging log.Fatal("Bye.") // Calls panic() after logging log.Panic("I'm bailing.") +After the logging level on a Logger is set, log entries with that severity or anything above it will be logged. +Eg- Default log.SetLevel(log.InfoLevel) will log anything that is info or above (warn, error, fatal, panic). +So, we will be setting operatorLogLevel in a separate controller to watch dnses and set log level. +operatorLogLevel: "Normal" will set logrus.SetLogLevel("Info") +operatorLogLevel: "Debug" will set logrus.SetLogLevel("Debug") +operatorLogLevel: "Trace" will set logrus.SetLogLevel("Trace") + + +### CoreDNS Log Level API + +Valid values for logLevel are: "Normal", "Debug", "Trace" as per [Operator Api](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) + +We will enable logging of CoreDNS's [classes of responses](https://github.com/coredns/coredns/tree/master/plugin/log#syntax) that correspond to the log level specified in the API. +So, + +logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error } + +logLevel "Debug" will enable log . { class denial error } + +logLevel "Trace" will enable log . { class all } + + + +We will creating an API under DNSSpec with **field name** `operatorlogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) + +```go +type DNSSpec struct { + // + + // operatorLogLevel controls the logging level of the DNS Operator. + // See LogLevel for more information about each available logging level. + // + // +optional + OperatorLogLevel LogLevel `json:"operatorLogLevel"` +} +``` + +This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator. + +Additionally, a new `LogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) will be added for CoredDNS logging : + +```go +// We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the LogLevel we have defined in openshift api. + +// So, + +// logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error } + +// logLevel "Debug" will enable log . { class denial error } + +// logLevel "Trace" will enable log . { class all } +type DNSSpec struct { + + // logLevel describes the logging verbosity of the DNSController for CoreDNS. + // + // +optional + LogLevel LogLevel `json:"logLevel"` +} + +``` + +Both of these new APIs would be accompanied by appropriate `LogLevel` definitions: + +```go + +// LogLevel describes several available logging verbosity levels. +// +kubebuilder:validation:Enum=Normal;Debug;Trace;TraceAll +type LogLevel string + +var ( + // Normal is the default. Normal, working log information, everything is fine, but helpful notices for auditing or common operations. In kube, this is probably glog=2. + Normal LogLevel = "Normal" + + // Debug is used when something went wrong. Even common operations may be logged, and less helpful but more quantity of notices. In kube, this is probably glog=4. + Debug LogLevel = "Debug" + + // Trace is used when something went really badly and even more verbose logs are needed. Logging every function call as part of a common operation, to tracing execution of a query. In kube, this is probably glog=6. + Trace LogLevel = "Trace" + + // TraceAll is used when something is broken at the level of API content/decoding. It will dump complete body content. If you turn this on in a production cluster + // prepare from serious performance issues and massive amounts of logs. In kube, this is probably glog=8. + TraceAll LogLevel = "TraceAll" +) +``` + +### User Stories + +* As an OpenShift Cluster Administrator, I want to be able to raise the logging level of the DNS Operator and CoreDNS so that I can more quickly +track down OpenShift DNS issues. + +* Some users recently added the new alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP. + Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response. + So we would to enable the log plugin for CoreDNS to log queries. + +* Some user want to avoid use of tcpdump to see the queries and want log plugin to be enabled to log queries in coredns. + + +### Implementation Details/Notes/Constraints [optional] + + +### Risks and Mitigations + +Raising the logging verbosity for any component typically results in larger log files that grow quickly. + + +## Design Details + +### Open Questions [optional] +N/A + +### Test Plan + +Unit tests will be written to test if setting LogLevel sets the respective logging in CoreDNS. +Unit tests will be written to test if setting operatorLogLevel sets the respective logging in DNS Operator. + +### Graduation Criteria + +N/A + +#### Dev Preview -> Tech Preview + +N/A + +#### Tech Preview -> GA + +N/A + +#### Removing a deprecated feature + +N/A + +### Upgrade / Downgrade Strategy + +On downgrade, any logging options are ignored by the DNS Operator and CoreDNS. +The downgraded operator will update the configmap and delete the log stanzas. + + +### Version Skew Strategy + +N/A + +## Implementation History + +[Work in Progress](https://github.com/openshift/api/) + +## Drawbacks + + +## Alternatives + +* Don't provide any DNS logging level APIs for the operator and coredns (current behavior) +* Raise current verbosity of the DNS Operator and coredns (not desirable) +