-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP-3077: contextual logging #3078
Conversation
/sig instrumentation |
4ce0b01
to
cd448e3
Compare
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exciting!
explicit logger parameter is suitable for functions which don’t need a context | ||
and never will. | ||
|
||
The rationale for not using both context and an explicit logger parameter is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this going to be statically checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could do that if we want to enforce it. It should fit into the existing logcheck
tool.
My concern is that (as with any linter) that perhaps there might be a legitimate reason why in some part of the code both a logger and a context need to be passed. Such a tool would then prevent that unless we also add a way to suppress the check per function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a comment saying that logcheck will check for this.
I have the check ready and also integrated logcheck with golangci-lint, which adresses my concern about valid usages of the pattern because those then can use nolint:logcheck
.
`logger.V(2)` as logger for less important ones. Then when the scheduler’s | ||
verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through | ||
the updated logger will be printed for important pods and skipped for less | ||
important ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the new feature here that the log level can be set for a log instance?
Then, if I call log.V(1)
, the actual level is 1 + log.level
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct. It's part of the logr.Logger
API and concept.
@thockin: this use case here would work better if logger.V(-1)
(= raising verbosity of log messages) was allowed. As it stands, the code that gets called has to use different levels than it normally would. It would be more natural to use the same levels as in other parts and then for important pods use logger.V(-1)
instead of logger
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making the log levels indicated in code non-deterministic seems pretty confusing... if I want to see a particular log line, how do I know what to set verbosity to now? if I set verbosity to a level matching a code call I see and don't see that line logged, how do I know if a parent call masked the log by lowering the verbosity?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
making the log levels indicated in code non-deterministic seems pretty confusing...
This is how logr is designed.
If we implement this use case the other way around (code uses "normal" log levels as described in the current Kubernetes conventions, a caller enables additional log output for important contexts via logger.V(-1)
), then we get the property that you are looking for: a logger.V(5).Info
log entry is guaranteed to be emitted for -v=5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is how logr is designed.
I understand logr supports a caller being the ultimate decider of how verbosely a library it calls gets to log at. That seems useful to be able to do things like quiet info/debug logs for the etcd client.
That doesn't mean we should use that capability within Kubernetes components to vary log verbosity dynamically for the same log lines. That seems difficult to use and understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is specifically what the scheduler developers want to achieve, though. I don't have a strong opinion on whether it is something that should be done.
If it gets done, I would prefer allowing logger.V(-1)
.
[metrics-server](https://github.com/kubernetes-sigs/metrics-server/blob/4b20c2d43e338d5df7fb530dc960e5d0753f7ab1/Makefile#L252-L257), | ||
it should better be hosted in a repo where “go install” is fast. It only | ||
depends on the standard Go runtime, therefore it can be moved to | ||
`k8s.io/klogr/logcheck`. The tool will be updated to check not only klog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you planning any new checks? Otherwise, document in the KEP that no changes are planned for the tool.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One change is already mentioned at the end of the paragraph:
The tool will be updated to check not only klog calls, but also calls through the
logr.Logger
interface.
If there's consensus about not passing ctx and logger as parameters, that would also need to be mentioned here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, after integrating logcheck into golangci-lint, I am more comfortable with adding such a check because in those cases where both needs to be passed (for example, because of performance concerns), nolint:logcheck
can be used to override the check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A bit of feedback, all of it informal.
#### Alpha | ||
|
||
- Common utility code available | ||
- At least kube-scheduler framework and some plugins converted | ||
- Initial e2e tests completed and enabled in Prow | ||
|
||
#### Beta | ||
|
||
- All of kube-scheduler converted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we also start to convert at least one thing that typically runs outside the control plane, we'll get better feedback that could help inform the remaining work. This formal choice to pick one extra component outside the scheduler could be for either alpha or beta.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What could that other thing be? There's not much in k/k that isn't part of the control plane.
kubectl might be a candidate, but that typically doesn't log much at all (basically just warnings from client-go), so the effect of converting it wouldn't be very visible.
Perhaps we need to consider components outside of k/k? I'm open for suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
components outside of k/k
My (uninformed) suggestion is the cluster autoscaler. It's something people run both in-cluster and separately, and it ought to be able to write logs with context. I don't know what in-project, out-of-tree components have already made headway in adopting structured logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good suggestion. It is based largely on the scheduler code and thus will automatically inherit all changes that we make there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about kube-proxy? It's typically a static pod in each node or a daemonset
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think kube-proxy falls into the same bucket as "Kubernetes control plane" - users who check the control plane are also likely to check kube-proxy and vice versa.
My understanding of @sftim's proposal was to branch out and reach people who normally do not pay attention to the control plane pods. Perhaps the autoscaler doesn't go far away enough from that, but I think it goes further than kube-proxy.
I know from SIG Scaling that users have problems configuring and using the autoscaler. It might be that improved logs could help with that problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess it's challenging in AWS. On GKE it's managed, so we have the same "control-plane" problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative would be the CSI sidecars. I already did a prototype for external-provisioner without using any of the APIs proposed here. That one could be adapted. The advantage is that there is interest in structured logging, we (= SIG-Storage) just decided to wait until there is also a decision about contextual logging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me go with the external-provisioner as another component for beta.
/cc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Directionally I am still 100% in favor.
WRT perf, we'd probabvly do better to optimize what we log and how often we log and the implementation of the logr-wrapper logic than to focus on sometimes not calling WithValues()
There's some uncertainty about the end-of-life plans, but I am pretty confident we can figure that out as it comes into focus.
All my comments are rleatively minor.
### Goals | ||
|
||
- Remove `k8s.io/klog` imports and usage from all packages. | ||
- Grant the caller of a function control over logging inside that function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would mentally de-emphasize the idea of per-caller control and instead focus on more-localized control.
I don't think it is a requirement that every function have the ability to modify every callee's logging. Having a subsystem-by-subsystem choice is the goal. I think it's safe to assume different subsystems have different needs.
E.g. I bet a client-go SetGlobalLogger(logr.Logger)
would provide a whole lot of utility and could decouple client-go users from klog. But a global logger in scheduler would be less obviously useful on its own.
|
||
## Proposal | ||
|
||
Log calls that were already converted to structured logging need to be updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you should state right at the beginning that while this proposal is very expansive, it can be implemented incrementally over as many releases as we need, with little risk to users.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a paragraph along those lines.
`logger.V(2)` as logger for less important ones. Then when the scheduler’s | ||
verbosity threshold is `-v=1`, a log message emitted with `V(1).InfoS` through | ||
the updated logger will be printed for important pods and skipped for less | ||
important ones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to rush to embrace all the capabilities of logr. As a hypothetical, I get this. Practically I am not sure this specific case is the best demonstration of this feature :)
```go | ||
func foo(ctx context.Context) { | ||
logger := klogr.FromContext(ctx).WithName(“foo”) | ||
ctx = klogr.NewContext(ctx, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
links don't help when github hides the file by default :(
If ther are a lot of discrete calls to WithValues
it feels like someone is doing something wrong. E.g. a linter COULD catch:
log = log.WithValues("foo", foo)
...
log = log.WithValues("bar", bar)
which can flatten to a single call and half as many allocations.
```go | ||
func foo(ctx context.Context) { | ||
logger := klogr.FromContext(ctx).WithName(“foo”) | ||
ctx = klogr.NewContext(ctx, logger) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and a linter could catch:
ctx = klogr.NewContext(ctx, logger.WithName("foo"))
and suggest breaking it to 2 statements
A new package `k8s.io/klogr` will replace both `k8s.io/klog` and | ||
`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I see the problem.
if k/k depends on foo and foo depends on klogr, and we vendor foo into k/k, klogr will resolve to ./staging/.../klogr - right?
A new package `k8s.io/klogr` will replace both `k8s.io/klog` and | ||
`k8s.io/klog/klogr`. It will be hosted under `staging` and thus get developed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still a distinct repo seems better to me, unless we think this will actually get a lot of churn (and we still have to maintain a sane API)
} | ||
``` | ||
|
||
The logcheck static code analysis tool will warn about code in Kubernetes which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hopefully when the gate goes away, we can undo this indirection?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. That one would be a simple regex search/replace. Added.
// GetPodVolumes returns a pod's PVCs separated into bound, unbound with delayed binding (including provisioning) | ||
// and unbound with immediate binding (including prebound) | ||
- GetPodVolumes(pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error) | ||
+ GetPodVolumes(logger klogr.Logger, pod *v1.Pod) (boundClaims, unboundClaimsDelayBinding, unboundClaimsImmediate []*v1.PersistentVolumeClaim, err error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd argue to put logger and things at the end of the list, personally, but we can defer that discussion until PRs
- Users of klog notified through a Kubernetes blog post and an email to | ||
dev@kubernetes.io that a logger must be set with k8s.io/klogr. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there must be some sleight of hand we can pull later, like a new version of klog that initializes klogr and leaves legacy callers of klog intact. I'm OK to let this proceed with some amount of uncertainty here.
I'll /lgtm here, minor points could be fixed as followup PR, I think /lgtm |
3fa82eb
to
947e98c
Compare
I think we have all the final signoffs in order to cautiously proceed with an alpha... /lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ehashman, pohly The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
As discussed in kubernetes/enhancements#3078, enabling code that only imports klog for all logging makes life simpler for developers. This was also the reason why KObj was added to klog instead of a separate helper package.
One-line PR description: KEP-3077: initial draft of contextual logging KEP
Issue link: contextual logging #3077