Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP-3077: contextual logging #3078

Merged
merged 1 commit into from
Feb 2, 2022

Conversation

pohly
Copy link
Contributor

@pohly pohly commented Dec 6, 2021

@k8s-ci-robot k8s-ci-robot added cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 6, 2021
@k8s-ci-robot k8s-ci-robot added kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Dec 6, 2021
@pohly pohly mentioned this pull request Dec 6, 2021
12 tasks
@pohly
Copy link
Contributor Author

pohly commented Dec 6, 2021

/sig instrumentation
/wg structured-logging

@alculquicondor
Copy link
Member

/cc

Copy link
Member

@alculquicondor alculquicondor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exciting!

explicit logger parameter is suitable for functions which don’t need a context
and never will.

The rationale for not using both context and an explicit logger parameter is
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this going to be statically checked?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could do that if we want to enforce it. It should fit into the existing logcheck tool.

My concern is that (as with any linter) that perhaps there might be a legitimate reason why in some part of the code both a logger and a context need to be passed. Such a tool would then prevent that unless we also add a way to suppress the check per function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a comment saying that logcheck will check for this.

I have the check ready and also integrated logcheck with golangci-lint, which adresses my concern about valid usages of the pattern because those then can use nolint:logcheck.

keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved
keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved
`logger.V(2)` as logger for less important ones. Then when the scheduler’s
verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through
the updated logger will be printed for important pods and skipped for less
important ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new feature here that the log level can be set for a log instance?

Then, if I call log.V(1), the actual level is 1 + log.level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct. It's part of the logr.Logger API and concept.

@thockin: this use case here would work better if logger.V(-1) (= raising verbosity of log messages) was allowed. As it stands, the code that gets called has to use different levels than it normally would. It would be more natural to use the same levels as in other parts and then for important pods use logger.V(-1) instead of logger.

Copy link
Member

@liggitt liggitt Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making the log levels indicated in code non-deterministic seems pretty confusing... if I want to see a particular log line, how do I know what to set verbosity to now? if I set verbosity to a level matching a code call I see and don't see that line logged, how do I know if a parent call masked the log by lowering the verbosity?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

making the log levels indicated in code non-deterministic seems pretty confusing...

This is how logr is designed.

If we implement this use case the other way around (code uses "normal" log levels as described in the current Kubernetes conventions, a caller enables additional log output for important contexts via logger.V(-1)), then we get the property that you are looking for: a logger.V(5).Info log entry is guaranteed to be emitted for -v=5.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how logr is designed.

I understand logr supports a caller being the ultimate decider of how verbosely a library it calls gets to log at. That seems useful to be able to do things like quiet info/debug logs for the etcd client.

That doesn't mean we should use that capability within Kubernetes components to vary log verbosity dynamically for the same log lines. That seems difficult to use and understand.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is specifically what the scheduler developers want to achieve, though. I don't have a strong opinion on whether it is something that should be done.

If it gets done, I would prefer allowing logger.V(-1).

keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Dec 8, 2021
[metrics-server](https://github.com/kubernetes-sigs/metrics-server/blob/4b20c2d43e338d5df7fb530dc960e5d0753f7ab1/Makefile#L252-L257),
it should better be hosted in a repo where “go install” is fast. It only
depends on the standard Go runtime, therefore it can be moved to
`k8s.io/klogr/logcheck`. The tool will be updated to check not only klog
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you planning any new checks? Otherwise, document in the KEP that no changes are planned for the tool.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One change is already mentioned at the end of the paragraph:

The tool will be updated to check not only klog calls, but also calls through the logr.Logger interface.

If there's consensus about not passing ctx and logger as parameters, that would also need to be mentioned here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, after integrating logcheck into golangci-lint, I am more comfortable with adding such a check because in those cases where both needs to be passed (for example, because of performance concerns), nolint:logcheck can be used to override the check.

Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit of feedback, all of it informal.

Comment on lines 736 to 744
#### Alpha

- Common utility code available
- At least kube-scheduler framework and some plugins converted
- Initial e2e tests completed and enabled in Prow

#### Beta

- All of kube-scheduler converted
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we also start to convert at least one thing that typically runs outside the control plane, we'll get better feedback that could help inform the remaining work. This formal choice to pick one extra component outside the scheduler could be for either alpha or beta.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What could that other thing be? There's not much in k/k that isn't part of the control plane.

kubectl might be a candidate, but that typically doesn't log much at all (basically just warnings from client-go), so the effect of converting it wouldn't be very visible.

Perhaps we need to consider components outside of k/k? I'm open for suggestions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

components outside of k/k

My (uninformed) suggestion is the cluster autoscaler. It's something people run both in-cluster and separately, and it ought to be able to write logs with context. I don't know what in-project, out-of-tree components have already made headway in adopting structured logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good suggestion. It is based largely on the scheduler code and thus will automatically inherit all changes that we make there.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about kube-proxy? It's typically a static pod in each node or a daemonset

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think kube-proxy falls into the same bucket as "Kubernetes control plane" - users who check the control plane are also likely to check kube-proxy and vice versa.

My understanding of @sftim's proposal was to branch out and reach people who normally do not pay attention to the control plane pods. Perhaps the autoscaler doesn't go far away enough from that, but I think it goes further than kube-proxy.

I know from SIG Scaling that users have problems configuring and using the autoscaler. It might be that improved logs could help with that problem.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess it's challenging in AWS. On GKE it's managed, so we have the same "control-plane" problem.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative would be the CSI sidecars. I already did a prototype for external-provisioner without using any of the APIs proposed here. That one could be adapted. The advantage is that there is interest in structured logging, we (= SIG-Storage) just decided to wait until there is also a decision about contextual logging.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me go with the external-provisioner as another component for beta.

keps/sig-instrumentation/3077-contextual-logging/kep.yaml Outdated Show resolved Hide resolved
keps/sig-instrumentation/3077-contextual-logging/README.md Outdated Show resolved Hide resolved
@yuzhiquan
Copy link
Member

/cc

Copy link
Member

@thockin thockin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Directionally I am still 100% in favor.

WRT perf, we'd probabvly do better to optimize what we log and how often we log and the implementation of the logr-wrapper logic than to focus on sometimes not calling WithValues()

There's some uncertainty about the end-of-life plans, but I am pretty confident we can figure that out as it comes into focus.

All my comments are rleatively minor.

### Goals

- Remove `k8s.io/klog` imports and usage from all packages.
- Grant the caller of a function control over logging inside that function.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would mentally de-emphasize the idea of per-caller control and instead focus on more-localized control.

I don't think it is a requirement that every function have the ability to modify every callee's logging. Having a subsystem-by-subsystem choice is the goal. I think it's safe to assume different subsystems have different needs.

E.g. I bet a client-go SetGlobalLogger(logr.Logger) would provide a whole lot of utility and could decouple client-go users from klog. But a global logger in scheduler would be less obviously useful on its own.


## Proposal

Log calls that were already converted to structured logging need to be updated
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should state right at the beginning that while this proposal is very expansive, it can be implemented incrementally over as many releases as we need, with little risk to users.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a paragraph along those lines.

`logger.V(2)` as logger for less important ones. Then when the scheduler’s
verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through
the updated logger will be printed for important pods and skipped for less
important ones.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)

```go
func foo(ctx context.Context) {
logger := klogr.FromContext(ctx).WithName(“foo”)
ctx = klogr.NewContext(ctx, logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

links don't help when github hides the file by default :(

If ther are a lot of discrete calls to WithValues it feels like someone is doing something wrong. E.g. a linter COULD catch:

log = log.WithValues("foo", foo)
...
log = log.WithValues("bar", bar)

which can flatten to a single call and half as many allocations.

```go
func foo(ctx context.Context) {
logger := klogr.FromContext(ctx).WithName(“foo”)
ctx = klogr.NewContext(ctx, logger)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and a linter could catch:

ctx = klogr.NewContext(ctx, logger.WithName("foo"))

and suggest breaking it to 2 statements

Comment on lines 418 to 419
A new package `k8s.io/klogr` will replace both `k8s.io/klog` and
`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I see the problem.

if k/k depends on foo and foo depends on klogr, and we vendor foo into k/k, klogr will resolve to ./staging/.../klogr - right?

Comment on lines 418 to 419
A new package `k8s.io/klogr` will replace both `k8s.io/klog` and
`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still a distinct repo seems better to me, unless we think this will actually get a lot of churn (and we still have to maintain a sane API)

}
```

The logcheck static code analysis tool will warn about code in Kubernetes which
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hopefully when the gate goes away, we can undo this indirection?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. That one would be a simple regex search/replace. Added.

// GetPodVolumes returns a pod's PVCs separated into bound, unbound with delayed binding (including provisioning)
// and unbound with immediate binding (including prebound)
- GetPodVolumes(pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error)
+ GetPodVolumes(logger klogr.Logger, pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd argue to put logger and things at the end of the list, personally, but we can defer that discussion until PRs

Comment on lines +767 to +906
- Users of klog notified through a Kubernetes blog post and an email to
dev@kubernetes.io that a logger must be set with k8s.io/klogr.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there must be some sleight of hand we can pull later, like a new version of klog that initializes klogr and leaves legacy callers of klog intact. I'm OK to let this proceed with some amount of uncertainty here.

@thockin
Copy link
Member

thockin commented Feb 2, 2022

I'll /lgtm here, minor points could be fixed as followup PR, I think

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022
@pohly pohly force-pushed the contextual-logging branch from 3fa82eb to 947e98c Compare February 2, 2022 23:20
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022
@ehashman
Copy link
Member

ehashman commented Feb 2, 2022

I think we have all the final signoffs in order to cautiously proceed with an alpha...

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 2, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ehashman, pohly

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Feb 2, 2022
@k8s-ci-robot k8s-ci-robot merged commit e139ba7 into kubernetes:master Feb 2, 2022
@k8s-ci-robot k8s-ci-robot added this to the v1.24 milestone Feb 2, 2022
pohly added a commit to pohly/klog that referenced this pull request Feb 7, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 7, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 7, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 7, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 9, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 10, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 10, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 11, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 11, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 11, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 14, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 15, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 16, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Feb 21, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
pohly added a commit to pohly/klog that referenced this pull request Mar 10, 2022
As discussed in kubernetes/enhancements#3078, enabling
code that only imports klog for all logging makes life simpler for developers.

This was also the reason why KObj was added to klog instead of a separate
helper package.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. wg/structured-logging Categorizes an issue or PR as relevant to WG Structured Logging.
Projects
None yet
Development

Successfully merging this pull request may close these issues.