Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add add_docker_metadata processor #4352

Merged
merged 3 commits into from
May 31, 2017
Merged

Conversation

exekias
Copy link
Contributor

@exekias exekias commented May 18, 2017

This processor will enrich events matching the desired configuration with Docker info:

  • docker.container.id
  • docker.container.image
  • docker.container.name
  • docker.container.labels

This PR adds docker/client dependency and cleans up common vendors with current go-dockerclient from metricbeat/module/docker.

I'm also using a patch for docker/client (exekias/moby@83d94aa) to ensure the client works in netbsd, will check with upstream if I can push that to master.

Important files to check here:
vendor.json:

Everything under add_docker_metadata (78f9943)

@exekias exekias force-pushed the add-docker-metadata branch 3 times, most recently from 6df9110 to c9cdf74 Compare May 18, 2017 16:04
@andrewkroh
Copy link
Member

I think we should consider using the top level vendor dir and removing the metricbeat/module/docker/vendor contents. We would have to check that nothing breaks in the docker module, but I'm guessing this would lead to smaller binaries. I don't know if cgo is used anywhere in these deps, but I have run into problems duplicate symbols when the same lib is included twice due to it being vendored under multiple paths.


watcher, err := watcherConstructor(config.Host, config.TLS)
if err != nil {
panic(err)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not return an error?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wow, I can believe I left that there, sorry

}

func (d *addDockerMetadata) String() string {
return "add_docker_metadata"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is customary to include config info in the string representation (assuming no secrets are included). I don't believe there is a standard format.

@andrewkroh
Copy link
Member

jenkins, test it

1 similar comment
@andrewkroh
Copy link
Member

jenkins, test it

Copy link
Member

@ruflin ruflin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is an addition we will really need in the future and become quite powerful. It especially is becoming interesting with docker logs and having the id in the pat (how are we going to extract it ;-) ) and then attach the info to it.

You used in this PR a new push API which I think we don't use yet. This makes a lot of sense in this context but would also be great to have in the docker module (or filebeat?)

I suggest two split this PR up in 3 parts:

  • Move docker dependencies to the top vendor directory. Be aware that we use our own fork of the docker project (see vendor.json on docker module)
  • Add processor
  • Decide where the add event as metricset or prospector

#processors:
#- add_docker_metadata:
# match_fields: ["system.process.cgroup.id"]
# #host: "unix:///var/run/docker.sock"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like indentation would be off if uncommented, as this is commented twice

meta.Put("container.labels", labels)
}

meta.Put("container.id", container.ID)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume this matches the "fields" with have in metricbeat for the docker containers?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, they do

value, err := event.GetValue(field)
if err == nil {
if strValue, ok := value.(string); ok {
cid = strValue
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fields is a list but in the end you only have one cid?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is no way to match several fields into several containers, that should not happen. Fields is a list cause you may have different fields you want to match, not all of them present in all events

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of assume, the different events will come from different modules or prospectors. Means in this case each prospector should define it's own processor. So there would not be a need to have an array of fields I think.

var cid string
for _, field := range d.fields {
value, err := event.GetValue(field)
if err == nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend using here the default pattern:

if err != nil {
   continue
}
...

Makes it easier to read in my opinion.

for _, c := range containers {
w.containers[c.ID] = &Container{
ID: c.ID,
Name: c.Names[0][1:], // Strip '/' from container names
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we doing that also in the container metricset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an spurious char given by this API call & version, doesn't seem to be the case for our other calls

WATCH:
for {
select {
case event := <-events:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is actually a docker event listener instead of polling the data. Very similar to what we have now in kubernetes? Should we also have this in the docker module as a metricset?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could make sense yes

#
#processors:
#- add_docker_metadata:
# match_fields: ["system.process.cgroup.id"]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was first confused by this and had to read the code to understand what it does. It takes this field and checks if the value of this field matches to any container id. If yes, it will add the container meta data to the event.

In the code, it can currently only match 1 field I think, but here it is plural and you use an array. How is this going to work exactly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW I'm not too sure about the example used here. In case someone uses cgroups, doesn't that kind of mean he doesn't want to use docker?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It checks all the fields until one of them matches

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for the example, you may want to have docker info on top of the cgroup id, so you can filter by, for instance, docker image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the config option is kind of field_that_matches_container_id: .... What is the use case of having multiple fields defined?

For the cgroup docker part: I see that this is what it's doing, but why would someone then not just use the docker module instead as he will rely on the docker api.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have several metrics matching cgroup id in system (maybe something we have to look into): system.process.cgroup.cpu.id, system.process.cgroup.memory.id, system.process.cgroup.blkio.id

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I assume per config you only want to match to one?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to match all of them, perhaps in the future it would be better to merge those fields into a global system.cgroup.id, with different meanings depending on the metricset, but as for now I would need to match any possible field there.

what do you think @andrewkroh, does this make any sense?

@exekias
Copy link
Contributor Author

exekias commented May 19, 2017

Ok I'll play with vendors and open a PR, then rebase this once that is merged

@monicasarbu monicasarbu added the in progress Pull request is currently in progress. label May 29, 2017
@exekias exekias force-pushed the add-docker-metadata branch 12 times, most recently from 46aaef3 to bfebcef Compare May 29, 2017 16:58
@exekias exekias removed the in progress Pull request is currently in progress. label May 29, 2017
@exekias
Copy link
Contributor Author

exekias commented May 29, 2017

Updated vendor folders, all details in PR description

@tsg
Copy link
Contributor

tsg commented May 31, 2017

Merging as the tests passed before the rebase, which only fixed the changelog.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants