Skip to content

Commit

Permalink
Add logLevel and operatorLogLevel APIs for DNS https://issues.redhat.…
Browse files Browse the repository at this point in the history
  • Loading branch information
miheer committed Oct 15, 2021
1 parent e2f4d4b commit 194f83d
Showing 1 changed file with 228 additions and 0 deletions.
228 changes: 228 additions & 0 deletions enhancements/dns/dns-operator-operand-logging-level.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
---
title: dns-operator-operand-logging-level
authors:
- "@miheer"
reviewers:
- "@alebedev87"
- "@candita"
- "@frobware"
- "@knobunc"
- "@Miciah"
- "@rfredette"
approvers:
- "@frobware"
- "@knobunc"
- "@Miciah"
- "@alebedev87"
- "@rfredette"
- "@candita"
creation-date: 2021-10-14
last-updated: 2021-10-14
status: implementable
---

# DNS Log Level API

## Release Signoff Checklist

- [X] Enhancement is `implementable`
- [ ] Design details are appropriately documented from clear requirements
- [ ] Test plan is defined
- [ ] Operational readiness criteria is defined
- [ ] Graduation criteria for dev preview, tech preview, GA
- [ ] User-facing documentation is created in [openshift-docs](https://github.com/openshift/openshift-docs/)

## Summary

This enhancement describes the API and code changes necessary to expose
a means to change the DNS Operator and CoreDNS Logging Levels to
cluster administrators.

## Motivation

Supporting a trivial way to raise the verbosity of the DNS Operator and its Operands (CoreDNS) would make debugging
the Operator and CoreDNS issues easier for cluster administrators and OpenShift developers.

For logging purposes, CoreDNS defines several classes of responses, such as error, denial and all.
denial: either NXDOMAIN or nodata responses (Name exists, type does not). A nodata response sets the return code to NOERROR.
error: SERVFAIL, NOTIMP, REFUSED, etc. Anything that indicates the remote server is not willing to resolve the request.
all: all responses, including successful responses, errors, and denials.
A logging level API for CoreDNS logs would assist cluster administrators who wish to have more control
over CoreDNS logs.

Also, a logging level API for the DNS Operator would assist OpenShift developers working on the DNS Operator who may
desire more in-depth logging statements when working on the operator's controllers.


### Goals

Add a user-facing API for controlling the run-time verbosity of the [OpenShift DNS Operator and CoreDNS](https://github.com/openshift/cluster-dns-operator).
### Non-Goals

* Change the default logging verbosity of the DNS Operator or CoreDNS in production OCP clusters.

## Proposal

### DNS Operator Log Level API
We will be creating an API for **field** `operatorLogLevel` in DNSSpec in accordance with [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71) type.
Valid values will be : "Normal", "Debug", "Trace".
We will using [logrus](https://github.com/sirupsen/logrus#level-logging) to enable the log levels for the operator level logging.
Logrus has seven logging levels: Trace, Debug, Info, Warning, Error, Fatal and Panic.
log.Trace("Something very low level.")
log.Debug("Useful debugging information.")
log.Info("Something noteworthy happened!")
log.Warn("You should probably take a look at this.")
log.Error("Something failed but I'm not quitting.") // Calls os.Exit(1) after logging log.Fatal("Bye.") // Calls panic() after logging log.Panic("I'm bailing.")
After the logging level on a Logger is set, log entries with that severity or anything above it will be logged.
Eg- Default log.SetLevel(log.InfoLevel) will log anything that is info or above (warn, error, fatal, panic).
So, we will be setting operatorLogLevel in a separate controller to watch dnses and set log level.
operatorLogLevel: "Normal" will set logrus.SetLogLevel("Info")
operatorLogLevel: "Debug" will set logrus.SetLogLevel("Debug")
operatorLogLevel: "Trace" will set logrus.SetLogLevel("Trace")


### CoreDNS Log Level API

Valid values for logLevel are: "Normal", "Debug", "Trace" as per [Operator Api](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62)

We will enable logging of CoreDNS's [classes of responses](https://github.com/coredns/coredns/tree/master/plugin/log#syntax) that correspond to the log level specified in the API.
So,

logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error }

logLevel "Debug" will enable log . { class denial error }

logLevel "Trace" will enable log . { class all }



We will creating an API under DNSSpec with **field name** `operatorlogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L71)

```go
type DNSSpec struct {
// <snip>

// operatorLogLevel controls the logging level of the DNS Operator.
// See LogLevel for more information about each available logging level.
//
// +optional
OperatorLogLevel LogLevel `json:"operatorLogLevel"`
}
```

This new field would allow a cluster administrator to specify the desired logging level specifically for the DNS Operator.

Additionally, a new `LogLevel` of type [LogLevel](https://github.com/openshift/api/blob/master/operator/v1/types.go#L62) will be added for CoredDNS logging :

```go
// We will be enabling the CLASSES(https://github.com/coredns/coredns/tree/master/plugin/log#syntax) of coredns w.r.t to the LogLevel we have defined in openshift api.

// So,

// logLevel "Normal" will enable errors class as we enable errors plugin by default log . { class error }

// logLevel "Debug" will enable log . { class denial error }

// logLevel "Trace" will enable log . { class all }
type DNSSpec struct {

// logLevel describes the logging verbosity of the DNSController for CoreDNS.
//
// +optional
LogLevel LogLevel `json:"logLevel"`
}

```

Both of these new APIs would be accompanied by appropriate `LogLevel` definitions:

```go

// LogLevel describes several available logging verbosity levels.
// +kubebuilder:validation:Enum=Normal;Debug;Trace;TraceAll
type LogLevel string

var (
// Normal is the default. Normal, working log information, everything is fine, but helpful notices for auditing or common operations. In kube, this is probably glog=2.
Normal LogLevel = "Normal"

// Debug is used when something went wrong. Even common operations may be logged, and less helpful but more quantity of notices. In kube, this is probably glog=4.
Debug LogLevel = "Debug"

// Trace is used when something went really badly and even more verbose logs are needed. Logging every function call as part of a common operation, to tracing execution of a query. In kube, this is probably glog=6.
Trace LogLevel = "Trace"

// TraceAll is used when something is broken at the level of API content/decoding. It will dump complete body content. If you turn this on in a production cluster
// prepare from serious performance issues and massive amounts of logs. In kube, this is probably glog=8.
TraceAll LogLevel = "TraceAll"
)
```

### User Stories

* As an OpenShift Cluster Administrator, I want to be able to raise the logging level of the DNS Operator and CoreDNS so that I can more quickly
track down OpenShift DNS issues.

* Some users recently added the new alert 'CoreDNS is returning SERVFAIL for X% of requests alert' to the recent updates of OCP.
Adding this prometheus alert is nice, but it would be more useful we can see which request are getting SERVFAIL response.
So we would to enable the log plugin for CoreDNS to log queries.

* Some user want to avoid use of tcpdump to see the queries and want log plugin to be enabled to log queries in coredns.


### Implementation Details/Notes/Constraints [optional]


### Risks and Mitigations

Raising the logging verbosity for any component typically results in larger log files that grow quickly.


## Design Details

### Open Questions [optional]
N/A

### Test Plan

Unit tests will be written to test if setting LogLevel sets the respective logging in CoreDNS.
Unit tests will be written to test if setting operatorLogLevel sets the respective logging in DNS Operator.

### Graduation Criteria

N/A

#### Dev Preview -> Tech Preview

N/A

#### Tech Preview -> GA

N/A

#### Removing a deprecated feature

N/A

### Upgrade / Downgrade Strategy

On downgrade, any logging options are ignored by the DNS Operator and CoreDNS.
The downgraded operator will update the configmap and delete the log stanzas.


### Version Skew Strategy

N/A

## Implementation History

[Work in Progress](https://github.com/openshift/api/)

## Drawbacks


## Alternatives

* Don't provide any DNS logging level APIs for the operator and coredns (current behavior)
* Raise current verbosity of the DNS Operator and coredns (not desirable)

0 comments on commit 194f83d

Please sign in to comment.