KEP-3077: contextual logging #3078

pohly · 2021-12-06T16:30:44Z

One-line PR description: KEP-3077: initial draft of contextual logging KEP
Issue link: contextual logging #3077

pohly · 2021-12-06T16:31:41Z

/sig instrumentation
/wg structured-logging

alculquicondor · 2021-12-07T14:58:55Z

/cc

keps/sig-instrumentation/3077-contextual-logging/README.md

alculquicondor

Exciting!

alculquicondor · 2021-12-07T21:50:38Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+explicit logger parameter is suitable for functions which don’t need a context
+and never will.
+
+The rationale for not using both context and an explicit logger parameter is


is this going to be statically checked?

We could do that if we want to enforce it. It should fit into the existing logcheck tool.

My concern is that (as with any linter) that perhaps there might be a legitimate reason why in some part of the code both a logger and a context need to be passed. Such a tool would then prevent that unless we also add a way to suppress the check per function.

I've added a comment saying that logcheck will check for this.

I have the check ready and also integrated logcheck with golangci-lint, which adresses my concern about valid usages of the pattern because those then can use nolint:logcheck.

keps/sig-instrumentation/3077-contextual-logging/README.md

alculquicondor · 2021-12-07T22:03:36Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+`logger.V(2)` as logger for less important ones. Then when the scheduler’s
+verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through
+the updated logger will be printed for important pods and skipped for less
+important ones.


Is the new feature here that the log level can be set for a log instance?

Then, if I call log.V(1), the actual level is 1 + log.level?

Correct. It's part of the logr.Logger API and concept.

@thockin: this use case here would work better if logger.V(-1) (= raising verbosity of log messages) was allowed. As it stands, the code that gets called has to use different levels than it normally would. It would be more natural to use the same levels as in other parts and then for important pods use logger.V(-1) instead of logger.

making the log levels indicated in code non-deterministic seems pretty confusing... if I want to see a particular log line, how do I know what to set verbosity to now? if I set verbosity to a level matching a code call I see and don't see that line logged, how do I know if a parent call masked the log by lowering the verbosity?

making the log levels indicated in code non-deterministic seems pretty confusing...

This is how logr is designed.

If we implement this use case the other way around (code uses "normal" log levels as described in the current Kubernetes conventions, a caller enables additional log output for important contexts via logger.V(-1)), then we get the property that you are looking for: a logger.V(5).Info log entry is guaranteed to be emitted for -v=5.

This is how logr is designed.

I understand logr supports a caller being the ultimate decider of how verbosely a library it calls gets to log at. That seems useful to be able to do things like quiet info/debug logs for the etcd client.

That doesn't mean we should use that capability within Kubernetes components to vary log verbosity dynamically for the same log lines. That seems difficult to use and understand.

We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)

It is specifically what the scheduler developers want to achieve, though. I don't have a strong opinion on whether it is something that should be done.

If it gets done, I would prefer allowing logger.V(-1).

keps/sig-instrumentation/3077-contextual-logging/README.md

alculquicondor · 2021-12-08T14:52:30Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+[metrics-server](https://github.com/kubernetes-sigs/metrics-server/blob/4b20c2d43e338d5df7fb530dc960e5d0753f7ab1/Makefile#L252-L257),
+it should better be hosted in a repo where “go install” is fast. It only
+depends on the standard Go runtime, therefore it can be moved to
+`k8s.io/klogr/logcheck`. The tool will be updated to check not only klog


Are you planning any new checks? Otherwise, document in the KEP that no changes are planned for the tool.

One change is already mentioned at the end of the paragraph:

The tool will be updated to check not only klog calls, but also calls through the logr.Logger interface.

If there's consensus about not passing ctx and logger as parameters, that would also need to be mentioned here.

FWIW, after integrating logcheck into golangci-lint, I am more comfortable with adding such a check because in those cases where both needs to be passed (for example, because of performance concerns), nolint:logcheck can be used to override the check.

keps/sig-instrumentation/3077-contextual-logging/kep.yaml

keps/sig-instrumentation/3077-contextual-logging/README.md

sftim

A bit of feedback, all of it informal.

keps/sig-instrumentation/3077-contextual-logging/README.md

sftim · 2021-12-13T22:47:56Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+#### Alpha
+
+- Common utility code available
+- At least kube-scheduler framework and some plugins converted
+- Initial e2e tests completed and enabled in Prow
+
+#### Beta
+
+- All of kube-scheduler converted


If we also start to convert at least one thing that typically runs outside the control plane, we'll get better feedback that could help inform the remaining work. This formal choice to pick one extra component outside the scheduler could be for either alpha or beta.

What could that other thing be? There's not much in k/k that isn't part of the control plane.

kubectl might be a candidate, but that typically doesn't log much at all (basically just warnings from client-go), so the effect of converting it wouldn't be very visible.

Perhaps we need to consider components outside of k/k? I'm open for suggestions.

components outside of k/k

My (uninformed) suggestion is the cluster autoscaler. It's something people run both in-cluster and separately, and it ought to be able to write logs with context. I don't know what in-project, out-of-tree components have already made headway in adopting structured logging.

That's a good suggestion. It is based largely on the scheduler code and thus will automatically inherit all changes that we make there.

what about kube-proxy? It's typically a static pod in each node or a daemonset

I think kube-proxy falls into the same bucket as "Kubernetes control plane" - users who check the control plane are also likely to check kube-proxy and vice versa.

My understanding of @sftim's proposal was to branch out and reach people who normally do not pay attention to the control plane pods. Perhaps the autoscaler doesn't go far away enough from that, but I think it goes further than kube-proxy.

I know from SIG Scaling that users have problems configuring and using the autoscaler. It might be that improved logs could help with that problem.

I guess it's challenging in AWS. On GKE it's managed, so we have the same "control-plane" problem.

Another alternative would be the CSI sidecars. I already did a prototype for external-provisioner without using any of the APIs proposed here. That one could be adapted. The advantage is that there is interest in structured logging, we (= SIG-Storage) just decided to wait until there is also a decision about contextual logging.

Let me go with the external-provisioner as another component for beta.

keps/sig-instrumentation/3077-contextual-logging/kep.yaml

keps/sig-instrumentation/3077-contextual-logging/README.md

yuzhiquan · 2021-12-13T23:23:37Z

/cc

keps/sig-instrumentation/3077-contextual-logging/kep.yaml

thockin

Directionally I am still 100% in favor.

WRT perf, we'd probabvly do better to optimize what we log and how often we log and the implementation of the logr-wrapper logic than to focus on sometimes not calling WithValues()

There's some uncertainty about the end-of-life plans, but I am pretty confident we can figure that out as it comes into focus.

All my comments are rleatively minor.

thockin · 2022-02-02T21:08:35Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+### Goals
+
+- Remove `k8s.io/klog` imports and usage from all packages.
+- Grant the caller of a function control over logging inside that function.


I would mentally de-emphasize the idea of per-caller control and instead focus on more-localized control.

I don't think it is a requirement that every function have the ability to modify every callee's logging. Having a subsystem-by-subsystem choice is the goal. I think it's safe to assume different subsystems have different needs.

E.g. I bet a client-go SetGlobalLogger(logr.Logger) would provide a whole lot of utility and could decouple client-go users from klog. But a global logger in scheduler would be less obviously useful on its own.

thockin · 2022-02-02T21:13:04Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+
+## Proposal
+
+Log calls that were already converted to structured logging need to be updated


I think you should state right at the beginning that while this proposal is very expansive, it can be implemented incrementally over as many releases as we need, with little risk to users.

Added a paragraph along those lines.

thockin · 2022-02-02T21:38:47Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+`logger.V(2)` as logger for less important ones. Then when the scheduler’s
+verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through
+the updated logger will be printed for important pods and skipped for less
+important ones.


We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)

thockin · 2022-02-02T22:12:54Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+```go
+func foo(ctx context.Context) {
+   logger := klogr.FromContext(ctx).WithName(“foo”)
+   ctx = klogr.NewContext(ctx, logger)


links don't help when github hides the file by default :(

If ther are a lot of discrete calls to WithValues it feels like someone is doing something wrong. E.g. a linter COULD catch:

log = log.WithValues("foo", foo) ... log = log.WithValues("bar", bar)

which can flatten to a single call and half as many allocations.

thockin · 2022-02-02T22:14:02Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+```go
+func foo(ctx context.Context) {
+   logger := klogr.FromContext(ctx).WithName(“foo”)
+   ctx = klogr.NewContext(ctx, logger)


and a linter could catch:

ctx = klogr.NewContext(ctx, logger.WithName("foo"))

and suggest breaking it to 2 statements

thockin · 2022-02-02T22:28:09Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+A new package `k8s.io/klogr` will replace both `k8s.io/klog` and
+`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed


I'm not sure I see the problem.

if k/k depends on foo and foo depends on klogr, and we vendor foo into k/k, klogr will resolve to ./staging/.../klogr - right?

thockin · 2022-02-02T22:30:08Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+A new package `k8s.io/klogr` will replace both `k8s.io/klog` and
+`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed


still a distinct repo seems better to me, unless we think this will actually get a lot of churn (and we still have to maintain a sane API)

thockin · 2022-02-02T22:32:54Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+}
+```
+
+The logcheck static code analysis tool will warn about code in Kubernetes which


hopefully when the gate goes away, we can undo this indirection?

Yes. That one would be a simple regex search/replace. Added.

thockin · 2022-02-02T22:39:36Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+        // GetPodVolumes returns a pod's PVCs separated into bound, unbound with delayed binding (including provisioning)
+        // and unbound with immediate binding (including prebound)
+-       GetPodVolumes(pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error)
+       GetPodVolumes(logger klogr.Logger, pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error)


I'd argue to put logger and things at the end of the list, personally, but we can defer that discussion until PRs

thockin · 2022-02-02T22:50:54Z

keps/sig-instrumentation/3077-contextual-logging/README.md

+- Users of klog notified through a Kubernetes blog post and an email to
+  dev@kubernetes.io that a logger must be set with k8s.io/klogr.


I feel like there must be some sleight of hand we can pull later, like a new version of klog that initializes klogr and leaves legacy callers of klog intact. I'm OK to let this proceed with some amount of uncertainty here.

thockin · 2022-02-02T22:54:35Z

I'll /lgtm here, minor points could be fixed as followup PR, I think

/lgtm

ehashman · 2022-02-02T23:33:45Z

I think we have all the final signoffs in order to cautiously proceed with an alpha...

/lgtm
/approve

k8s-ci-robot · 2022-02-02T23:34:25Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ehashman, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~keps/prod-readiness/OWNERS~~ [ehashman]
~~keps/sig-instrumentation/OWNERS~~ [ehashman]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.

k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 6, 2021

k8s-ci-robot requested review from brancz and logicalhan December 6, 2021 16:31

k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Dec 6, 2021

pohly mentioned this pull request Dec 6, 2021

contextual logging #3077

Open

12 tasks

k8s-ci-robot added the wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging. label Dec 6, 2021

pohly force-pushed the contextual-logging branch 2 times, most recently from 4ce0b01 to cd448e3 Compare December 6, 2021 18:06

This was referenced Dec 7, 2021

Do contextual logging for scheduler kubernetes/kubernetes#91633

Closed

log: noderesourcetopology: per-flow additional log kubernetes-sigs/scheduler-plugins#289

Closed

Accept context when logging. go-logr/logr#116

Closed

k8s-ci-robot requested a review from alculquicondor December 7, 2021 14:58

thockin reviewed Dec 7, 2021

View reviewed changes

keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved

keps/sig-instrumentation/3077-contextual-logging/README.md Show resolved Hide resolved

alculquicondor reviewed Dec 7, 2021

View reviewed changes

k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 8, 2021

alculquicondor reviewed Dec 8, 2021

View reviewed changes

alculquicondor reviewed Dec 9, 2021

View reviewed changes

This was referenced Dec 10, 2021

optimize KObj and KObjs kubernetes/kubernetes#106945

Closed

RFC: faster implementation of KObj kubernetes/klog#261

Closed

Update logging guidelines kubernetes/community#6283

Merged

sftim reviewed Dec 13, 2021

View reviewed changes

k8s-ci-robot requested a review from yuzhiquan December 13, 2021 23:23

This was referenced Dec 17, 2021

RFC: testinglogger: per-test, structured logging kubernetes/klog#240

Closed

Contextual structured logging kubernetes/klog#239

Closed

ehashman reviewed Feb 2, 2022

View reviewed changes

keps/sig-instrumentation/3077-contextual-logging/kep.yaml Outdated Show resolved Hide resolved

thockin reviewed Feb 2, 2022

View reviewed changes

k8s-ci-robot assigned thockin Feb 2, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022

KEP-3077: initial draft

947e98c

pohly force-pushed the contextual-logging branch from 3fa82eb to 947e98c Compare February 2, 2022 23:20

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2022

k8s-ci-robot merged commit e139ba7 into kubernetes:master Feb 2, 2022

k8s-ci-robot added this to the v1.24 milestone Feb 2, 2022

pohly mentioned this pull request Apr 5, 2022

logging: include information about contextual logging kubernetes/community#6560

Merged


		## Proposal

		Log calls that were already converted to structured logging need to be updated

		A new package `k8s.io/klogr` will replace both `k8s.io/klog` and
		`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed

		- Users of klog notified through a Kubernetes blog post and an email to
		dev@kubernetes.io that a logger must be set with k8s.io/klogr.

KEP-3077: contextual logging #3078

KEP-3077: contextual logging #3078

Conversation

pohly commented Dec 6, 2021

pohly commented Dec 6, 2021

alculquicondor commented Dec 7, 2021

alculquicondor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

liggitt Feb 2, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sftim left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yuzhiquan commented Dec 13, 2021

thockin left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

thockin commented Feb 2, 2022

ehashman commented Feb 2, 2022

k8s-ci-robot commented Feb 2, 2022

liggitt Feb 2, 2022 •

edited

Loading