-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding pid matcher for add_kubernetes_metadata processor. #16549
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
1 similar comment
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
💚 CLA has been signed |
Tests on master seem to fail in general.... |
Thank you for opening @jtinkus. Process info enrichment is a great feature to have! Just recently we got a different PR that adds the container id to processes metadata: #15947 With that PR in, it should be possible to enrich processes with k8s metadata by just matching that field. How do you feel about that? I'm wondering if we should only introduce that PR to keep a single way of doing this kind of enrichment. |
Hello!
Skimmed over the code. Seems similar thing. Configurable regex leaves more
flexibility IMHO. And in first glance did not see falling back to ppid when
cgroup file for pid not found.
But you are more familiar with whole product, so leaving it up to you :)
We need k8s metadata matched and actually we are in hurry with that...
So please keep me posted!
With best regards,
--
Jako
…On Fri, Feb 28, 2020 at 12:00 PM Carlos Pérez-Aradros Herce < ***@***.***> wrote:
Thank you for opening @jtinkus <https://github.com/jtinkus>. Process info
enrichment is a great feature to have!
Just recently we got a different PR that adds the container id to
processes metadata: #15947 <#15947>
With that PR in, it should be possible to enrich processes with k8s
metadata by just matching that field. How do you feel about that? I'm
wondering if we should only introduce that PR to keep a single way of doing
this kind of enrichment.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#16549?email_source=notifications&email_token=AINMFKWA7T5O7U3OQNOIEZDRFDODLA5CNFSM4K3E6VL2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOENH6WKY#issuecomment-592440107>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AINMFKUEJUPKBWNOKF2SCHLRFDODLANCNFSM4K3E6VLQ>
.
--
Jako
|
Hello! Seems that there are problems with other PR you mentioned. Also seems that we are trying to solve a bit different problems. Enrichment of process metadata vs simple matching of kubernetes metadata. In our case going with this other PR would mean using add_process_metadata processor first and then add_kubernetes_metadata processor. This would mean overhead of collecting other process metadata where only container id is needed. Also in my testing it seems rather often that process is already ended and container id is obtianed based on ppid. That makes % of getting cid successfully a bit higher too. @exekias, WDYT, maybe it would be OK to have such a small bit of duplication finally, as those PR-s try to solve different problem in my opinion and getting cid from cgroup is not small and isolated functionality in other PR... With best regards, |
|
||
// For easy stubbing file read in unit tests | ||
var readCgroupFile = func(pid int) ([]byte, error) { | ||
return ioutil.ReadFile(fmt.Sprintf("/proc/%d/cgroup", pid)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A feature from other PR to support custom host fs mounts to access /proc
(e.g. /hostfs/proc
) would be nice. It'd drop reliance on running in host's PID namespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, will add that too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// NewPidMatcher initializes and returns a PidMatcher | ||
func NewPidMatcher(cfg common.Config) (Matcher, error) { | ||
config := struct { | ||
RegexField string `config:"matcher_regex"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since k8s is following a principle of /prefix/class/podID/cID
wouldn't string manipulation be more performant?
e.g.:
func getCID(cgroup string) (string, error) {
if i := strings.LastIndex(cgroup, "/"); i >= 0 {
return cgroup[i+1:], nil
}
return "", fmt.Errorf("Can't extract container ID, cgroup: %s", cgroup)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I do not know about performance, i'm sure regex is more flexible.
For example this another PR used prefixes from conf instead to choose right line from cgroups returned by gosigar...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would you prefer using gosigar and prfixes like other PR does?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I understand gosigar will get you the content of /proc/$pid/cgroup
.
you use prefixes for detecting if given process is using e.g. /docker
or /kubepods
check my PR to add_docker_metadata
: #16926
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, i understand how it works. gosigar gives map of cgroup strings, but i'm not able to assess performance difference over that current regex solution where i search first regex match over whole cgroup file content...
If you know or feel that iterating over that string map searching matching prefix and then splitting string is faster, i'm happy to change.
Just a thought, this kubernetes meta plugin should be used only with kubernetes, so i guess that docker specific setup is not relevant here...
All together I'd like to end up with good solution :) Sadly i'm completely new with go and also beats, just solving a problem for related project my work depends on. So all of your advice is more than appreciated.
Thank you!!!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
only time I'm iterating over whole cgroup map is when there is no container ID otherwise only the first entry is checked. I'll read a little bit more about cgroups and containers because I don't know if there is a possibility that it can run in a container with only subset of cgroups created
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://medium.com/@dgryski/speeding-up-regexp-matching-with-ragel-4727f1c16027
Here is something about regex performance....
|
||
// MetadataIndex returns index for matching metadata indexed based on container id | ||
func (p *PidMatcher) MetadataIndex(event common.MapStr) string { | ||
val, err := event.GetValue("process.pid") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Specifying which field contains process ID would give more freedom to the user to specify if you want to look at ppid
or pid
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain more. I'm expecting pid and ppid to be in ratrher concrete place in event. Do you mean that instead of trying with ppid when pid not successful you'd like to see configurable option to try with ppid only?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
both, with hardcoded process.pid
and/or process.ppid
people can't inspect e.g. system.process.pid
.
For example add_docker_metadata
allows you to pick which field to inspect for a process id
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Will make them fields configurable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
// MetadataIndex returns index for matching metadata indexed based on container id | ||
func (p *PidMatcher) MetadataIndex(event common.MapStr) string { | ||
// As we are post processing here, actual process may be already exited. | ||
// So trying to find the container id by the parent process id first, hoping parent process lives a bit longer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here I changed the order. Ppid is used first. Now i have feeling, that falling back to pid just in case is actually useless, meaning that if cgroup file for ppid is already gone, then child process is closed too, right?
So if i'm not mistaken, then using only ppid for extracting container id should be enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what if ppid
belongs to the container runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right. so in rare cases this pid part is still useful.
Alternative PR went to master. |
What does this PR do?
Adding new matcher for add_kubernetes_metadata processor. Matcher takes pid from event and extracts container id from cgroup file based on pid.
Why is it important?
Enables Kubernetes metadata matching if there is no other suitable data in event for current matchers, for example matching AuditBeat events.
Checklist
Author's Checklist
How to test this PR locally
Run AuditBeat on kubernetes worker node or in container with add_kubernetes_metadata processor enabled, configuring it to use container indexer and pid matcher:
processors:
Related issues
Use cases
Screenshots
Logs