-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Ignore stats from terminated container if new was created. #1708
Ignore stats from terminated container if new was created. #1708
Conversation
/assign @piosz |
metrics/sources/summary/summary.go
Outdated
@@ -194,10 +194,22 @@ func (this *summaryMetricsSource) decodePodStats(metrics map[string]*MetricSet, | |||
|
|||
for _, container := range pod.Containers { | |||
key := PodContainerKey(ref.Namespace, ref.Name, container.Name) | |||
if _, exist := metrics[key]; exist && containerIsTerminated(&container, metrics[key].CreateTime) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC the logic is as follow:
- if we first encounter an old container, then the new one will just replace it
- if we first encounter the new, this check will ensure that the old will not replace it
It's not that obvious. Please add a comment explaining this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment was added.
metrics/sources/summary/summary.go
Outdated
metrics[key] = this.decodeContainerStats(podMetrics.Labels, &container, false) | ||
} | ||
} | ||
|
||
func containerIsTerminated(container *stats.ContainerStats, otherStartTime time.Time) bool { | ||
if container.StartTime.Time.Before(otherStartTime) && *container.CPU.UsageNanoCores == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about avoiding double checking of the same condition container.StartTime.Time.Before(otherStartTime)
by implementing this in the following way:
if container.StartTime.Time.Before(otherStartTime) {
if *container.CPU.UsageNanoCores == 0 {
return true
}
glog.Warningf(...)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, like that it's much better.
2fc5700
to
5a2462e
Compare
All comments have been resolved. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one place were we should have logging, but otherwise LGTM 👍
metrics/sources/summary/summary.go
Outdated
// This check ensures that we are not replacing metrics of running container with metrics of terminated one if | ||
// there are two exactly same containers reported by kubelet. | ||
if _, exist := metrics[key]; exist && containerIsTerminated(&container, metrics[key].CreateTime) { | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please please please always log conditions like this (at high log level, like 8). Otherwise, it's impossible to debug user deployments without sending a patched Heapster container to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment. I have tried to include all information in the log that could help to debug issue.
Could you check it once again?
4f1db8a
to
4dd9527
Compare
looks good to me, will merge soon. |
Thanks a lot! |
metrics/sources/summary/summary.go
Outdated
metrics[key] = this.decodeContainerStats(podMetrics.Labels, &container, false) | ||
} | ||
} | ||
|
||
func containerIsTerminated(container *stats.ContainerStats, otherStartTime time.Time) bool { | ||
if container.StartTime.Time.Before(otherStartTime) { | ||
if *container.CPU.UsageNanoCores == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CPU usage may not be accurate. It's possible that the process didn't run for the past time period. Could you use memory instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the comment, it really makes sense. I think checking of CPU and Memory usage should be good enough heuristic.
I have updated a code and test.
It may happen that cadvisor continues to report metrics for container that was terminated even if a new one had been created. In such cases heapster randomly selects metric from the terminated or running container. This commit adds simple heuristic to ignore container that is older and don't have any cpu usage.
4dd9527
to
b4cb344
Compare
@piosz do you have any other comments? |
/lgtm |
Fix #1697. |
It may happen that cadvisor continues to report metrics for container
that was terminated even if a new one had been created. In such cases
heapster randomly selects metric from the terminated or running
container. This commit adds simple heuristic to ignore container that
is older and don't have any cpu usage.