Add logLevel and operatorLogLevel APIs for DNS https://issues.redhat.…

…com/browse/NE-367
openshift · Oct 15, 2021 · 194f83d · 194f83d
1 parent e2f4d4b
commit 194f83d
Showing 1 changed file with 228 additions and 0 deletions.
diff --git a/enhancements/dns/dns-operator-operand-logging-level.md b/enhancements/dns/dns-operator-operand-logging-level.md
@@ -0,0 +1,228 @@
+---
+title: dns-operator-operand-logging-level
+authors:
+  - "@miheer"
+reviewers:
+  - "@alebedev87"
+  - "@candita"
+  - "@frobware"
+  - "@knobunc"
+  - "@Miciah"
+  - "@rfredette"
+approvers:
+  - "@frobware"
+  - "@knobunc"
+  - "@Miciah"
+  - "@alebedev87"
+  - "@rfredette"
+  - "@candita"
+creation-date: 2021-10-14
+last-updated: 2021-10-14
+status: implementable
+---
+
+# DNS Log Level API
+
+## Release Signoff Checklist
+
+- [X] Enhancement is `implementable`
+- [ ] Design details are appropriately documented from clear requirements
+- [ ] Test plan is defined
+- [ ] Operational readiness criteria is defined
+- [ ] Graduation criteria for dev preview, tech preview, GA
+- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)
+
+## Summary
+
+This enhancement describes the API and code changes necessary to expose
+a means to change the DNS Operator and CoreDNS Logging Levels to
+cluster administrators.
+
+## Motivation
+
+Supporting a trivial way to raise the verbosity of the DNS Operator and its Operands (CoreDNS) would make debugging
+the Operator and CoreDNS issues easier for cluster administrators and OpenShift developers.
+
+For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all.
+denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR.
+error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request.
+all: all responses, including successful responses, errors, and denials.
+A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control
+over CoreDNS logs.
+
+Also, a logging level API for the DNS Operator would assist OpenShift developers working on the DNS Operator who may
+desire more in-depth logging statements when working on the operator's controllers.
+
+
+### Goals
+
+Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator).
+### Non-Goals
+
+* Change the default logging verbosity of the DNS Operator or CoreDNS in production OCP clusters.
+
+## Proposal
+
+### DNS Operator Log Level API
+We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) type.
+Valid values will be : "Normal", "Debug", "Trace".
+We will using [logrus](https://github.com/sirupsen/logrus#level-logging) to enable the log levels for the operator level logging.
+Logrus has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic.
+log.Trace("Something very low level.")
+log.Debug("Useful debugging information.")
+log.Info("Something noteworthy happened!")
+log.Warn("You should probably take a look at this.")
+log.Error("Something failed but I'm not quitting.") // Calls os.Exit(1) after logging log.Fatal("Bye.") // Calls panic() after logging log.Panic("I'm bailing.")
+After the logging level on a Logger is set, log entries with that severity or anything above it will be logged.
+Eg- Default log.SetLevel(log.InfoLevel) will log anything that is info or above (warn, error, fatal, panic).
+So, we will be setting operatorLogLevel in a separate controller to watch dnses and set log level.
+operatorLogLevel: "Normal" will set logrus.SetLogLevel("Info")
+operatorLogLevel: "Debug" will set logrus.SetLogLevel("Debug")
+operatorLogLevel: "Trace" will set  logrus.SetLogLevel("Trace")
+
+
+### CoreDNS Log Level API
+
+Valid values for logLevel are: "Normal", "Debug", "Trace" as per [Operator Api](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62)
+
+We will enable logging of CoreDNS's [classes of responses](https://github.com/coredns/coredns/tree/master/plugin/log#syntax) that correspond to the log level specified in the API.
+So,
+
+logLevel "Normal"  will enable errors class as we enable errors plugin by default log . { class error }
+
+logLevel "Debug" will enable log . { class denial error }
+
+logLevel "Trace" will enable  log . { class all }
+
+
+
+We will creating an API under DNSSpec with **field name** `operatorlogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71)
+
+```go
+type DNSSpec struct {
+	// <snip>
+
+	// operatorLogLevel controls the logging level of the DNS Operator.
+	// See LogLevel for more information about each available logging level.
+	//
+	// +optional
+	OperatorLogLevel LogLevel `json:"operatorLogLevel"`
+}
+```
+
+This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator.
+
+Additionally, a new `LogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) will be added for CoredDNS logging :
+
+```go
+// We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the LogLevel we have defined in openshift api.
+
+// So,
+
+// logLevel "Normal"  will enable errors class as we enable errors plugin by default log . { class error }
+
+// logLevel "Debug" will enable log . { class denial error }
+
+// logLevel "Trace" will enable  log . { class all }
+type DNSSpec struct {
+
+	// logLevel describes the logging verbosity of the DNSController for CoreDNS.
+	//
+	// +optional
+	LogLevel LogLevel `json:"logLevel"`
+}
+
+```
+
+Both of these new APIs would be accompanied by appropriate `LogLevel` definitions:
+
+```go
+
+// LogLevel describes several available logging verbosity levels.
+// +kubebuilder:validation:Enum=Normal;Debug;Trace;TraceAll
+type LogLevel string
+
+var (
+	// Normal is the default.  Normal, working log information, everything is fine, but helpful notices for auditing or common operations.  In kube, this is probably glog=2.
+	Normal LogLevel = "Normal"
+
+	// Debug is used when something went wrong.  Even common operations may be logged, and less helpful but more quantity of notices.  In kube, this is probably glog=4.
+	Debug LogLevel = "Debug"
+
+	// Trace is used when something went really badly and even more verbose logs are needed.  Logging every function call as part of a common operation, to tracing execution of a query.  In kube, this is probably glog=6.
+	Trace LogLevel = "Trace"
+
+	// TraceAll is used when something is broken at the level of API content/decoding.  It will dump complete body content.  If you turn this on in a production cluster
+	// prepare from serious performance issues and massive amounts of logs.  In kube, this is probably glog=8.
+	TraceAll LogLevel = "TraceAll"
+)
+```
+
+### User Stories
+
+* As an OpenShift Cluster Administrator, I want to be able to raise the logging level of the DNS Operator and CoreDNS so that I can more quickly
+track down OpenShift DNS issues.
+
+* Some users recently added the new alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP.
+  Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response.
+  So we would to enable the log plugin for CoreDNS to log queries.
+
+* Some user want to avoid use of tcpdump to see the queries and want log plugin to be enabled to log queries in coredns.
+
+
+### Implementation Details/Notes/Constraints [optional]
+
+
+### Risks and Mitigations
+
+Raising the logging verbosity for any component typically results in larger log files that grow quickly.
+
+
+## Design Details
+
+### Open Questions [optional]
+N/A
+
+### Test Plan
+
+Unit tests will be written to test if setting LogLevel sets the respective logging in CoreDNS.
+Unit tests will be written to test if setting operatorLogLevel sets the respective logging in DNS Operator.
+
+### Graduation Criteria
+
+N/A
+
+#### Dev Preview -> Tech Preview
+
+N/A
+
+#### Tech Preview -> GA
+
+N/A
+
+#### Removing a deprecated feature
+
+N/A
+
+### Upgrade / Downgrade Strategy
+
+On downgrade, any logging options are ignored by the DNS Operator and CoreDNS.
+The downgraded operator will update the configmap and delete the log stanzas.
+
+
+### Version Skew Strategy
+
+N/A
+
+## Implementation History
+
+[Work in Progress](https://github.com/openshift/api/)
+
+## Drawbacks
+
+
+## Alternatives
+
+* Don't provide any DNS logging level APIs for the operator and coredns (current behavior)
+* Raise current verbosity of the DNS Operator and coredns (not desirable)
+