Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep docker & k8s pod annotations while they are needed #5084

Merged
merged 1 commit into from
Sep 7, 2017

Conversation

exekias
Copy link
Contributor

@exekias exekias commented Sep 1, 2017

In some cases pod annotations are neede after the container/pod is deleted, for
instance when filebeat is reading the log behind the container.

This change makes sure we keep metadata after a pod is gone. By storing
access times we ensure that it's available as long as it's being used

fixes #4986

@exekias exekias added bug in progress Pull request is currently in progress. libbeat review labels Sep 1, 2017
@exekias exekias changed the title Keep pod annotations while they are needed Keep docker & k8s pod annotations while they are needed Sep 3, 2017
@exekias
Copy link
Contributor Author

exekias commented Sep 3, 2017

@dengqingpei1990 this PR should fix #4986, just in case you want to give this a try before we merge it. From my local tests, it seems to work nice

Copy link
Contributor

@vjsamuel vjsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few corner scenarios are breaking with this change. One, I have left comment on the method itself. Other is as follows:

Take this scenario. User deploys under namespace 1 a stateful set a pod that will come up as zk-1, zk-2, zk-3.

In metricbeat, we use IpPort index and PodName indexes. If a pod in a stateful set get restarted/recreated all together, they are going to come up with the same name and in most scenarios the same ip port combination.

When we look up pods in this change, we look at the deleted cache first and then goto the existing pod workloads. It should ideally be the other way around. If a stateful set pod is updated with new set of labels, the index keys will remain the same but the metadata would have changed.

Both scenarios should ideally be fixable by checking live pod metadata first and returning from that and then check dead pods.

p.annotationCache.RLock()
meta, ok := p.annotationCache.annotations[arg]
var deleted bool
if ok {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For logs, we can update the pod labels using kubectl label po. The container name remains the same and the update will call onPodDelete && onPodAdd. So now the living container's metadata is in the deleted map. This portion of code will end up returning old metadata.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we should first check metadata from live pods and return data for that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a commit addressing this, basically, if an add method is called it will remove any deleted flag for all its indexes

@exekias
Copy link
Contributor Author

exekias commented Sep 4, 2017

good point, I've addressed the issue 👍

w.containers[event.Actor.ID] = &Container{
ID: event.Actor.ID,
Name: name,
Image: image,
Labels: event.Actor.Attributes,
}
w.Unlock()
}

// Delete
if event.Action == "die" || event.Action == "kill" {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a container dies and gets started again it will still time out eventually and lose the metadata.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that case is treated, as onPodAdd (also called in case of update) cleans any deleted flag: https://github.com/elastic/beats/pull/5084/files#diff-c211c9481fb346b82d8713e779e0be9fR141

Copy link
Contributor Author

@exekias exekias Sep 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry you are right, I was talking about the Kubernetes watcher, not Docker, will apply the same fix, thanks!

I'm also working on tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pushed b54187e

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I restart a container, the events are: kill -> die -> stop -> start -> restart. So the container will not be un-deleted in this case.

Looking at it, adding the metadata on "created" and removing on "die" does seem inconsistent to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me do some tests, looks like it would be better to add it on start, isn't it?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding on start is the better approach.

@exekias exekias force-pushed the fix-k8s-log-ending-lines branch from 162f58c to b54187e Compare September 4, 2017 09:06
@exekias exekias added needs_backport PR is waiting to be backported to other branches. v6.0.0-rc1 labels Sep 4, 2017
@exekias exekias force-pushed the fix-k8s-log-ending-lines branch 5 times, most recently from 1ea2445 to cc7260e Compare September 4, 2017 14:14
@exekias exekias removed the in progress Pull request is currently in progress. label Sep 4, 2017
Copy link
Contributor

@vjsamuel vjsamuel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@exekias
Copy link
Contributor Author

exekias commented Sep 6, 2017

jenkins retest this

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code LGTM. I think this needs an addition to the config file reference and potentially also docs?

}, nil
}

// Container returns the running container with the given ID or nil if unknown
func (w *watcher) Container(ID string) *Container {
return w.containers[ID]
w.RLock()
container := w.containers[ID]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm aware the code was here before, but should we check here if ok if the ID actually exists?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We get nil if the container is not there, and we treat that case

var toDelete []string
timeout := time.Now().Add(-w.cleanupTimeout)
w.RLock()
for key, lastSeen := range w.deleted {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we add some debug logging to this? Otherwise very hard to figure out what is still in memory and what was removed.

// Check entries for timeout
var toDelete []string
timeout := time.Now().Add(-p.cleanupTimeout)
p.annotationCache.RLock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not for this PR, but the code looks so similar to the other cleanupWorker, I'm thinking if parts could be shared.

Copy link
Contributor

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Could you squash the commits?

In some cases pod annotations are neede after the pod is deleted, for
instance when filebeat is reading the log behind the container.

This change makes sure we keep metadata after a pod is gone. By storing
access times we ensure that it's available as long as it's being used
@exekias exekias force-pushed the fix-k8s-log-ending-lines branch from 6357cec to 7d3314e Compare September 7, 2017 09:33
@exekias
Copy link
Contributor Author

exekias commented Sep 7, 2017

done, thanks for the review everyone!

@ruflin ruflin merged commit a49463c into elastic:master Sep 7, 2017
@exekias exekias removed the needs_backport PR is waiting to be backported to other branches. label Sep 8, 2017
exekias added a commit to exekias/beats that referenced this pull request Sep 8, 2017
In some cases pod annotations are neede after the pod is deleted, for
instance when filebeat is reading the log behind the container.

This change makes sure we keep metadata after a pod is gone. By storing
access times we ensure that it's available as long as it's being used
(cherry picked from commit a49463c)
tsg pushed a commit that referenced this pull request Sep 8, 2017
In some cases pod annotations are neede after the pod is deleted, for
instance when filebeat is reading the log behind the container.

This change makes sure we keep metadata after a pod is gone. By storing
access times we ensure that it's available as long as it's being used
(cherry picked from commit a49463c)
leweafan pushed a commit to leweafan/beats that referenced this pull request Apr 28, 2023
elastic#5133)

In some cases pod annotations are neede after the pod is deleted, for
instance when filebeat is reading the log behind the container.

This change makes sure we keep metadata after a pod is gone. By storing
access times we ensure that it's available as long as it's being used
(cherry picked from commit d24a88a)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Filebeat lose metadata of docker container when container stops
4 participants