-
Notifications
You must be signed in to change notification settings - Fork 463
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add logLevel and operatorLogLevel APIs for DNS https://issues.redhat.…
- Loading branch information
Showing
1 changed file
with
228 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,228 @@ | ||
--- | ||
title: dns-operator-operand-logging-level | ||
authors: | ||
- "@miheer" | ||
reviewers: | ||
- "@alebedev87" | ||
- "@candita" | ||
- "@frobware" | ||
- "@knobunc" | ||
- "@Miciah" | ||
- "@rfredette" | ||
approvers: | ||
- "@frobware" | ||
- "@knobunc" | ||
- "@Miciah" | ||
- "@alebedev87" | ||
- "@rfredette" | ||
- "@candita" | ||
creation-date: 2021-10-14 | ||
last-updated: 2021-10-14 | ||
status: implementable | ||
--- | ||
|
||
# DNS Log Level API | ||
|
||
## Release Signoff Checklist | ||
|
||
- [X] Enhancement is `implementable` | ||
- [ ] Design details are appropriately documented from clear requirements | ||
- [ ] Test plan is defined | ||
- [ ] Operational readiness criteria is defined | ||
- [ ] Graduation criteria for dev preview, tech preview, GA | ||
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/) | ||
|
||
## Summary | ||
|
||
This enhancement describes the API and code changes necessary to expose | ||
a means to change the DNS Operator and CoreDNS Logging Levels to | ||
cluster administrators. | ||
|
||
## Motivation | ||
|
||
Supporting a trivial way to raise the verbosity of the DNS Operator and its Operands (CoreDNS) would make debugging | ||
the Operator and CoreDNS issues easier for cluster administrators and OpenShift developers. | ||
|
||
For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all. | ||
denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR. | ||
error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request. | ||
all: all responses, including successful responses, errors, and denials. | ||
A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control | ||
over CoreDNS logs. | ||
|
||
Also, a logging level API for the DNS Operator would assist OpenShift developers working on the DNS Operator who may | ||
desire more in-depth logging statements when working on the operator's controllers. | ||
|
||
|
||
### Goals | ||
|
||
Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator). | ||
### Non-Goals | ||
|
||
* Change the default logging verbosity of the DNS Operator or CoreDNS in production OCP clusters. | ||
|
||
## Proposal | ||
|
||
### DNS Operator Log Level API | ||
We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) type. | ||
Valid values will be : "Normal", "Debug", "Trace". | ||
We will using [logrus](https://github.com/sirupsen/logrus#level-logging) to enable the log levels for the operator level logging. | ||
Logrus has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic. | ||
log.Trace("Something very low level.") | ||
log.Debug("Useful debugging information.") | ||
log.Info("Something noteworthy happened!") | ||
log.Warn("You should probably take a look at this.") | ||
log.Error("Something failed but I'm not quitting.") // Calls os.Exit(1) after logging log.Fatal("Bye.") // Calls panic() after logging log.Panic("I'm bailing.") | ||
After the logging level on a Logger is set, log entries with that severity or anything above it will be logged. | ||
Eg- Default log.SetLevel(log.InfoLevel) will log anything that is info or above (warn, error, fatal, panic). | ||
So, we will be setting operatorLogLevel in a separate controller to watch dnses and set log level. | ||
operatorLogLevel: "Normal" will set logrus.SetLogLevel("Info") | ||
operatorLogLevel: "Debug" will set logrus.SetLogLevel("Debug") | ||
operatorLogLevel: "Trace" will set logrus.SetLogLevel("Trace") | ||
|
||
|
||
### CoreDNS Log Level API | ||
|
||
Valid values for logLevel are: "Normal", "Debug", "Trace" as per [Operator Api](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) | ||
|
||
We will enable logging of CoreDNS's [classes of responses](https://github.com/coredns/coredns/tree/master/plugin/log#syntax) that correspond to the log level specified in the API. | ||
So, | ||
|
||
logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error } | ||
|
||
logLevel "Debug" will enable log . { class denial error } | ||
|
||
logLevel "Trace" will enable log . { class all } | ||
|
||
|
||
|
||
We will creating an API under DNSSpec with **field name** `operatorlogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) | ||
|
||
```go | ||
type DNSSpec struct { | ||
// <snip> | ||
|
||
// operatorLogLevel controls the logging level of the DNS Operator. | ||
// See LogLevel for more information about each available logging level. | ||
// | ||
// +optional | ||
OperatorLogLevel LogLevel `json:"operatorLogLevel"` | ||
} | ||
``` | ||
|
||
This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator. | ||
|
||
Additionally, a new `LogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) will be added for CoredDNS logging : | ||
|
||
```go | ||
// We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the LogLevel we have defined in openshift api. | ||
|
||
// So, | ||
|
||
// logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error } | ||
|
||
// logLevel "Debug" will enable log . { class denial error } | ||
|
||
// logLevel "Trace" will enable log . { class all } | ||
type DNSSpec struct { | ||
|
||
// logLevel describes the logging verbosity of the DNSController for CoreDNS. | ||
// | ||
// +optional | ||
LogLevel LogLevel `json:"logLevel"` | ||
} | ||
|
||
``` | ||
|
||
Both of these new APIs would be accompanied by appropriate `LogLevel` definitions: | ||
|
||
```go | ||
|
||
// LogLevel describes several available logging verbosity levels. | ||
// +kubebuilder:validation:Enum=Normal;Debug;Trace;TraceAll | ||
type LogLevel string | ||
|
||
var ( | ||
// Normal is the default. Normal, working log information, everything is fine, but helpful notices for auditing or common operations. In kube, this is probably glog=2. | ||
Normal LogLevel = "Normal" | ||
|
||
// Debug is used when something went wrong. Even common operations may be logged, and less helpful but more quantity of notices. In kube, this is probably glog=4. | ||
Debug LogLevel = "Debug" | ||
|
||
// Trace is used when something went really badly and even more verbose logs are needed. Logging every function call as part of a common operation, to tracing execution of a query. In kube, this is probably glog=6. | ||
Trace LogLevel = "Trace" | ||
|
||
// TraceAll is used when something is broken at the level of API content/decoding. It will dump complete body content. If you turn this on in a production cluster | ||
// prepare from serious performance issues and massive amounts of logs. In kube, this is probably glog=8. | ||
TraceAll LogLevel = "TraceAll" | ||
) | ||
``` | ||
|
||
### User Stories | ||
|
||
* As an OpenShift Cluster Administrator, I want to be able to raise the logging level of the DNS Operator and CoreDNS so that I can more quickly | ||
track down OpenShift DNS issues. | ||
|
||
* Some users recently added the new alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP. | ||
Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response. | ||
So we would to enable the log plugin for CoreDNS to log queries. | ||
|
||
* Some user want to avoid use of tcpdump to see the queries and want log plugin to be enabled to log queries in coredns. | ||
|
||
|
||
### Implementation Details/Notes/Constraints [optional] | ||
|
||
|
||
### Risks and Mitigations | ||
|
||
Raising the logging verbosity for any component typically results in larger log files that grow quickly. | ||
|
||
|
||
## Design Details | ||
|
||
### Open Questions [optional] | ||
N/A | ||
|
||
### Test Plan | ||
|
||
Unit tests will be written to test if setting LogLevel sets the respective logging in CoreDNS. | ||
Unit tests will be written to test if setting operatorLogLevel sets the respective logging in DNS Operator. | ||
|
||
### Graduation Criteria | ||
|
||
N/A | ||
|
||
#### Dev Preview -> Tech Preview | ||
|
||
N/A | ||
|
||
#### Tech Preview -> GA | ||
|
||
N/A | ||
|
||
#### Removing a deprecated feature | ||
|
||
N/A | ||
|
||
### Upgrade / Downgrade Strategy | ||
|
||
On downgrade, any logging options are ignored by the DNS Operator and CoreDNS. | ||
The downgraded operator will update the configmap and delete the log stanzas. | ||
|
||
|
||
### Version Skew Strategy | ||
|
||
N/A | ||
|
||
## Implementation History | ||
|
||
[Work in Progress](https://github.com/openshift/api/) | ||
|
||
## Drawbacks | ||
|
||
|
||
## Alternatives | ||
|
||
* Don't provide any DNS logging level APIs for the operator and coredns (current behavior) | ||
* Raise current verbosity of the DNS Operator and coredns (not desirable) | ||
|