Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance add_kubernetes_metadata matcher #28868

101 changes: 64 additions & 37 deletions filebeat/processor/add_kubernetes_metadata/matchers.go
Original file line number Diff line number Diff line change
Expand Up @@ -78,59 +78,79 @@ func newLogsPathMatcher(cfg common.Config) (add_kubernetes_metadata.Matcher, err
// Docker container ID is a 64-character-long hexadecimal string
const containerIdLen = 64

// Pod UID is on the 5th index of the path directories
const podUIDPos = 5

func (f *LogPathMatcher) MetadataIndex(event common.MapStr) string {
value, err := event.GetValue("log.file.path")
if err == nil {
source := value.(string)
f.logger.Debugf("Incoming log.file.path value: %s", source)
if err != nil {
f.logger.Errorf("Error extracting log.file.path from the event")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may still need to log source here so as to have a more clear view of how the event that is related to this error looks like.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you mean event instead of source? this error might only occur in case event map does not contain log.file.path(=source) key

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeap my wrong, event is what I was thinking about. Logging the whole event would be helpful.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also maybe just a debug message would be enough, since ERRORs would flood a basic case when the processor for some reason does not work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed: e8ab1fe

return ""
}

if !strings.Contains(source, f.LogsPath) {
f.logger.Errorf("Error extracting container id - source value does not contain matcher's logs_path '%s'.", f.LogsPath)
return ""
}
source := value.(string)
f.logger.Debugf("Incoming log.file.path value: %s", source)

if !strings.Contains(source, f.LogsPath) {
f.logger.Errorf("Error extracting container id - source value does not contain matcher's logs_path '%s'.", f.LogsPath)
return ""
}

sourceLen := len(source)
logsPathLen := len(f.LogsPath)
sourceLen := len(source)
logsPathLen := len(f.LogsPath)

if f.ResourceType == "pod" {
// Specify a pod resource type when manually mounting log volumes and they end up under "/var/lib/kubelet/pods/"
// This will extract only the pod UID, which offers less granularity of metadata when compared to the container ID
if strings.HasPrefix(f.LogsPath, podLogsPath()) && strings.HasSuffix(source, ".log") {
if f.ResourceType == "pod" {
// Pod resource type will extract only the pod UID, which offers less granularity of metadata when compared to the container ID
if strings.HasSuffix(source, ".log") {
// Specify a pod resource type when writting logs into manually mounted log volume,
// those logs apper under under "/var/lib/kubelet/pods/<pod_id>/volumes/..."
if strings.HasPrefix(f.LogsPath, podKubeletLogsPath()) {
pathDirs := strings.Split(source, pathSeparator)
podUIDPos := 5
if len(pathDirs) > podUIDPos {
podUID := strings.Split(source, pathSeparator)[podUIDPos]

f.logger.Debugf("Using pod uid: %s", podUID)
return podUID
}

f.logger.Error("Error extracting pod uid - source value contains matcher's logs_path, however it is too short to contain a Pod UID.")
}
} else {
// In case of the Kubernetes log path "/var/log/containers/",
// the container ID will be located right before the ".log" extension.
if strings.HasPrefix(f.LogsPath, containerLogsPath()) && strings.HasSuffix(source, ".log") && sourceLen >= containerIdLen+4 {
containerIDEnd := sourceLen - 4
cid := source[containerIDEnd-containerIdLen : containerIDEnd]
f.logger.Debugf("Using container id: %s", cid)
return cid
}

// In any other case, we assume the container ID will follow right after the log path.
// However we need to check the length to prevent "slice bound out of range" runtime errors.
if sourceLen >= logsPathLen+containerIdLen {
cid := source[logsPathLen : logsPathLen+containerIdLen]
f.logger.Debugf("Using container id: %s", cid)
return cid
// In case of the Kubernetes log path "/var/log/pods/",
// the pod ID will be extracted from the directory name,
// file name example: "/var/log/pods/'<namespace>_<pod_name>_<pod_uid>'/container_name/0.log".
if strings.HasPrefix(f.LogsPath, podLogsPath()) {
pathDirs := strings.Split(source, pathSeparator)
podUIDPos := 4
if len(pathDirs) > podUIDPos {
podUID := strings.Split(pathDirs[podUIDPos], "_")
if len(podUID) > 2 {
f.logger.Debugf("Using pod uid: %s", podUID)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe log the podUID[2] here instead of the array

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thank you for pointing it out! fixed: e8ab1fe

return podUID[2]
}
}
}

f.logger.Error("Error extracting container id - source value contains matcher's logs_path, however it is too short to contain a Docker container ID.")
f.logger.Error(`Error extracting pod uid - source value does not contains matcher's logs_path,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering if we should prevent this from happening by validating the configuration. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please elaborate on this? do you mean do smth similar to other processors. like - https://github.com/elastic/beats/blob/master/libbeat/processors/add_kubernetes_metadata/config.go ?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in

func (k *kubeAnnotatorConfig) Validate() error {
we can check if matchers' config is valid too so as to warn the users early on and avoid initialising the processor in case the config is not supported. I see that the matcher's config part is a map so not sure how much mangling this change would require but it should be doable. Let me know what you think.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will give it a try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ChrsMark please have a look - ab3c821, let me know wdyt, are there any better ways maybe?

supported log_path for 'pod' resource_type: '/var/lib/kubelet/pods/', '/var/log/pods/'.`)
return ""
}
}
// In case of the Kubernetes log path "/var/log/containers/",
// the container ID will be located right before the ".log" extension.
// file name example: /var/log/containers/<pod_name>_<namespace>_<container_name>-<continer_id>.log
if strings.HasPrefix(f.LogsPath, containerLogsPath()) && strings.HasSuffix(source, ".log") && sourceLen >= containerIdLen+4 {
containerIDEnd := sourceLen - 4
cid := source[containerIDEnd-containerIdLen : containerIDEnd]
f.logger.Debugf("Using container id: %s", cid)
return cid
}

// In any other case, we assume the container ID will follow right after the log path.
// However we need to check the length to prevent "slice bound out of range" runtime errors.
// for the default log path /var/lib/docker/containers/ container ID will follow right after the log path.
// file name example: /var/lib/docker/containers/<container_id>/<container_id>-json.log
if sourceLen >= logsPathLen+containerIdLen {
cid := source[logsPathLen : logsPathLen+containerIdLen]
f.logger.Debugf("Using container id: %s", cid)
return cid
}

f.logger.Error("Error extracting container id - source value contains matcher's logs_path, however it is too short to contain a Docker container ID.")
return ""
}

Expand All @@ -141,13 +161,20 @@ func defaultLogPath() string {
return "/var/lib/docker/containers/"
}

func podLogsPath() string {
func podKubeletLogsPath() string {
if runtime.GOOS == "windows" {
return "C:\\var\\lib\\kubelet\\pods\\"
}
return "/var/lib/kubelet/pods/"
}

func podLogsPath() string {
if runtime.GOOS == "windows" {
return "C:\\var\\log\\pods\\"
}
return "/var/log/pods/"
}

func containerLogsPath() string {
if runtime.GOOS == "windows" {
return "C:\\var\\log\\containers\\"
Expand Down
24 changes: 24 additions & 0 deletions filebeat/processor/add_kubernetes_metadata/matchers_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,30 @@ func TestLogsPathMatcher_InvalidSource4(t *testing.T) {
executeTestWithResourceType(t, cfgLogsPath, cfgResourceType, source, expectedResult)
}

func TestLogsPathMatcher_InvalidVarLogPodSource(t *testing.T) {
cfgLogsPath := "/var/log/pods/"
cfgResourceType := "pod"
source := fmt.Sprintf("/invalid/dir/namespace_pod-name_%s/container/0.log", puid)
expectedResult := ""
executeTestWithResourceType(t, cfgLogsPath, cfgResourceType, source, expectedResult)
}

func TestLogsPathMatcher_InvalidVarLogPodIDFormat(t *testing.T) {
cfgLogsPath := "/var/log/pods/"
cfgResourceType := "pod"
source := fmt.Sprintf("/var/log/pods/%s/container/0.log", puid)
expectedResult := ""
executeTestWithResourceType(t, cfgLogsPath, cfgResourceType, source, expectedResult)
}

func TestLogsPathMatcher_ValidVarLogPod(t *testing.T) {
cfgLogsPath := "/var/log/pods/"
cfgResourceType := "pod"
source := fmt.Sprintf("/var/log/pods/namespace_pod-name_%s/container/0.log", puid)
expectedResult := puid
executeTestWithResourceType(t, cfgLogsPath, cfgResourceType, source, expectedResult)
}

func executeTest(t *testing.T, cfgLogsPath string, source string, expectedResult string) {
executeTestWithResourceType(t, cfgLogsPath, "", source, expectedResult)
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -83,10 +83,29 @@ the `log.file.path` field.
This matcher has the following configuration settings:

`logs_path`:: (Optional) Base path of container logs. If not specified, it uses
the default logs path of the platform where {beatname_uc} is running.
`resource_type`:: (Optional) Type of the resource to obtain the ID of. It can be
`pod`, to make the lookup based on the pod UID, or `container`, to make the
lookup based on the container ID. It defaults to `container`.
the default logs path of the platform where {beatname_uc} is running: for Linux -
`/var/lib/docker/containers/`, Windowd - `C:\\ProgramData\\Docker\\containers`.
tetianakravchenko marked this conversation as resolved.
Show resolved Hide resolved
To change the default value: container ID must follow right after the `logs_path` -
`<log_path>/<container_id>`, where `container_id` is a 64-character-long
hexadecimal string.

`resource_type`:: (Optional) Type of the resource to obtain the ID of.
Valid `resource_type`:
* `pod`: to make the lookup based on the pod UID. When `resource_type` is set to
`pod`, `logs_path` must be set as well, supported path in this case:
** `/var/lib/kubelet/pods/` used to read logs from mounted into the pod volumes,
those logs end up under `/var/lib/kubelet/pods/<pod UID>/volumes/<volume name>/...`
To use `/var/lib/kubelet/pods/` as a `log_path`, `/var/lib/kubelet/pods` must be
mounted into the filebear pods.
tetianakravchenko marked this conversation as resolved.
Show resolved Hide resolved
** `/var/log/pods/`
Note: when using `resource_type: 'pod'` logs will be enriched only with pod
metadata: pod id, pod name, etc., not container metadata.
*`container`: to make the lookup based on the container ID, `logs_path` must
be set to `/var/log/containers/`.
It defaults to `container`.

To be able to use `logs_path` matcher filebeat input path must be a subdirectory
of directory defined in `logs_path` configuration setting.

The default configuration is able to lookup the metadata using the container ID
when the logs are collected from the default docker logs path
Expand Down