Skip to content

Commit

Permalink
[pkg/stanza] Add container operator parser (#32594)
Browse files Browse the repository at this point in the history
**Description:** <Describe what has changed.>
<!--Ex. Fixing a bug - Describe the bug and how this fixes the issue.
Ex. Adding a feature - Explain what this achieves.-->
This PR implements the new container logs parser as it was proposed at
#31959.

**Link to tracking Issue:** <Issue number if applicable>
#31959

**Testing:** <Describe what testing was performed and which tests were
added.>

Added unit tests. Providing manual testing steps as well:

### How to test this manually

1. Using the following config file:
```yaml
receivers:
  filelog:
    start_at: end
    include_file_name: false
    include_file_path: true
    include:
    - /var/log/pods/*/*/*.log
    operators:
      - id: container-parser
        type: container
        output: m1
      - type: move
        id: m1
        from: attributes.k8s.pod.name
        to: attributes.val
      - id: some
        type: add
        field: attributes.key2.key_in
        value: val2

exporters:
  debug:
    verbosity: detailed

service:
  pipelines:
    logs:
      receivers: [filelog]
      exporters: [debug]
      processors: []
```
2. Start the collector:
`./bin/otelcontribcol_linux_amd64 --config
~/otelcol/container_parser/config.yaml`
3. Use the following bash script to create some logs:
```bash
#! /bin/bash

echo '2024-04-13T07:59:37.505201169-05:00 stdout P This is a very very long crio line th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log
echo '{"log":"INFO: log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log
echo '2024-04-13T07:59:37.505201169-05:00 stdout F at is awesome! crio is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log
echo '2021-06-22T10:27:25.813799277Z stdout P some containerd log th' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log
echo '{"log":"INFO: another log line here","stream":"stdout","time":"2029-03-30T08:31:20.545192187Z"}' >> /var/log/pods/kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log
echo '2021-06-22T10:27:25.813799277Z stdout F at is super awesome! Containerd is awesome' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log



echo '2024-04-13T07:59:37.505201169-05:00 stdout F standalone crio line which is awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler43/1.log
echo '2021-06-22T10:27:25.813799277Z stdout F standalone containerd line that is super awesome!' >> /var/log/pods/kube-scheduler-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d3/kube-scheduler44/1.log
```
4. Run the above as a bash script to verify any parallel processing.
Verify that the output is correct.


### Test manually on k8s

1. `make docker-otelcontribcol && docker tag otelcontribcol
otelcontribcol-dev:0.0.1 && kind load docker-image
otelcontribcol-dev:0.0.1`
2. Install using the following helm values file:
```yaml
mode: daemonset
presets:
  logsCollection:
    enabled: true

image:
  repository: otelcontribcol-dev
  tag: "0.0.1"
  pullPolicy: IfNotPresent

command:
  name: otelcontribcol

config:
  exporters:
    debug:
      verbosity: detailed
  receivers:
    filelog:
      start_at: end
      include_file_name: false
      include_file_path: true
      exclude:
        - /var/log/pods/default_daemonset-opentelemetry-collector*_*/opentelemetry-collector/*.log
      include:
        - /var/log/pods/*/*/*.log
      operators:
        - id: container-parser
          type: container
          output: some
        - id: some
          type: add
          field: attributes.key2.key_in
          value: val2


  service:
    pipelines:
      logs:
        receivers: [filelog]
        processors: [batch]
        exporters: [debug]
```
3. Check collector's output to verify the logs are parsed properly:
```console
2024-05-10T07:52:02.307Z	info	LogsExporter	{"kind": "exporter", "data_type": "logs", "name": "debug", "resource logs": 1, "log records": 2}
2024-05-10T07:52:02.307Z	info	ResourceLog #0
Resource SchemaURL: 
ScopeLogs #0
ScopeLogs SchemaURL: 
InstrumentationScope  
LogRecord #0
ObservedTimestamp: 2024-05-10 07:52:02.046236071 +0000 UTC
Timestamp: 2024-05-10 07:52:01.92533954 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(otel logs at 07:52:01)
Attributes:
     -> log: Map({"iostream":"stdout"})
     -> time: Str(2024-05-10T07:52:01.92533954Z)
     -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}})
     -> logtag: Str(F)
     -> key2: Map({"key_in":"val2"})
     -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log)
Trace ID: 
Span ID: 
Flags: 0
LogRecord #1
ObservedTimestamp: 2024-05-10 07:52:02.046411602 +0000 UTC
Timestamp: 2024-05-10 07:52:02.027386192 +0000 UTC
SeverityText: 
SeverityNumber: Unspecified(0)
Body: Str(otel logs at 07:52:02)
Attributes:
     -> log.file.path: Str(/var/log/pods/default_daemonset-logs-6f6mn_1069e46b-03b2-4532-a71f-aaec06c0197b/busybox/0.log)
     -> time: Str(2024-05-10T07:52:02.027386192Z)
     -> log: Map({"iostream":"stdout"})
     -> logtag: Str(F)
     -> k8s: Map({"container":{"name":"busybox","restart_count":"0"},"namespace":{"name":"default"},"pod":{"name":"daemonset-logs-6f6mn","uid":"1069e46b-03b2-4532-a71f-aaec06c0197b"}})
     -> key2: Map({"key_in":"val2"})
Trace ID: 
Span ID: 
Flags: 0
...
```


**Documentation:** <Describe the documentation added.>  Added

Signed-off-by: ChrsMark <chrismarkou92@gmail.com>
  • Loading branch information
ChrsMark authored May 14, 2024
1 parent c6a6bd4 commit 90935ce
Show file tree
Hide file tree
Showing 11 changed files with 1,306 additions and 14 deletions.
27 changes: 27 additions & 0 deletions .chloggen/add_container_parser.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: filelogreceiver

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Add container operator parser

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [31959]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
1 change: 1 addition & 0 deletions pkg/stanza/adapter/register.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ package adapter // import "github.com/open-telemetry/opentelemetry-collector-con
import (
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/output/file" // Register parsers and transformers for stanza-based log receivers
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/output/stdout"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/container"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/csv"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/json"
_ "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/parser/jsonarray"
Expand Down
238 changes: 238 additions & 0 deletions pkg/stanza/docs/operators/container.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,238 @@
## `container` operator

The `container` operator parses logs in `docker`, `cri-o` and `containerd` formats.

### Configuration Fields

| Field | Default | Description |
|------------------------------|------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `id` | `container` | A unique identifier for the operator. |
| `format` | `` | The container log format to use if it is known. Users can choose between `docker`, `crio` and `containerd`. If not set, the format will be automatically detected. |
| `add_metadata_from_filepath` | `true` | Set if k8s metadata should be added from the file path. Requires the `log.file.path` field to be present. |
| `output` | Next in pipeline | The connected operator(s) that will receive all outbound entries. |
| `parse_from` | `body` | The [field](../types/field.md) from which the value will be parsed. |
| `parse_to` | `attributes` | The [field](../types/field.md) to which the value will be parsed. |
| `on_error` | `send` | The behavior of the operator if it encounters an error. See [on_error](../types/on_error.md). |
| `if` | | An [expression](../types/expression.md) that, when set, will be evaluated to determine whether this operator should be used for the given entry. This allows you to do easy conditional parsing without branching logic with routers. |
| `severity` | `nil` | An optional [severity](../types/severity.md) block which will parse a severity field before passing the entry to the output operator. |


### Embedded Operations

The `container` parser can be configured to embed certain operations such as the severity parsing. For more information, see [complex parsers](../types/parsers.md#complex-parsers).

### Add metadata from file path

Requires `include_file_path: true` in order for the `log.file.path` field to be available for the operator.
If that's not possible, users can disable the metadata addition with `add_metadata_from_filepath: false`.
A file path like `"/var/log/pods/some-ns_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"`,
will produce the following k8s metadata:

```json
{
"attributes": {
"k8s": {
"container": {
"name": "kube-controller",
"restart_count": "1"
}, "pod": {
"uid": "49cc7c1fd3702c40b2686ea7486091d6",
"name": "kube-controller-kind-control-plane"
}, "namespace": {
"name": "some-ns"
}
}
}
}
```

### Example Configurations:

#### Parse the body as docker container log

Configuration:
```yaml
- type: container
format: docker
add_metadata_from_filepath: true
```
Note: in this example the `format: docker` is optional since formats can be automatically detected as well.
`add_metadata_from_filepath` is true by default as well.

<table>
<tr><td> Input body </td> <td> Output body</td></tr>
<tr>
<td>

```json
{
"timestamp": "",
"body": "{\"log\":\"INFO: log line here\",\"stream\":\"stdout\",\"time\":\"2029-03-30T08:31:20.545192187Z\"}",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
```

</td>
<td>

```json
{
"timestamp": "2024-03-30 08:31:20.545192187 +0000 UTC",
"body": "log line here",
"attributes": {
"time": "2024-03-30T08:31:20.545192187Z",
"log.iostream": "stdout",
"k8s.pod.name": "kube-controller-kind-control-plane",
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
"k8s.container.name": "kube-controller",
"k8s.container.restart_count": "1",
"k8s.namespace.name": "some",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
}
```

</td>
</tr>
</table>

#### Parse the body as cri-o container log

Configuration:
```yaml
- type: container
```

<table>
<tr><td> Input body </td> <td> Output body</td></tr>
<tr>
<td>

```json
{
"timestamp": "",
"body": "2024-04-13T07:59:37.505201169-05:00 stdout F standalone crio line which is awesome",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
```

</td>
<td>

```json
{
"timestamp": "2024-04-13 12:59:37.505201169 +0000 UTC",
"body": "standalone crio line which is awesome",
"attributes": {
"time": "2024-04-13T07:59:37.505201169-05:00",
"logtag": "F",
"log.iostream": "stdout",
"k8s.pod.name": "kube-controller-kind-control-plane",
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
"k8s.container.name": "kube-controller",
"k8s.container.restart_count": "1",
"k8s.namespace.name": "some",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
}
```

</td>
</tr>
</table>

#### Parse the body as containerd container log

Configuration:
```yaml
- type: container
```

<table>
<tr><td> Input body </td> <td> Output body</td></tr>
<tr>
<td>

```json
{
"timestamp": "",
"body": "2023-06-22T10:27:25.813799277Z stdout F standalone containerd line that is super awesome",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
```

</td>
<td>

```json
{
"timestamp": "2023-06-22 10:27:25.813799277 +0000 UTC",
"body": "standalone containerd line that is super awesome",
"attributes": {
"time": "2023-06-22T10:27:25.813799277Z",
"logtag": "F",
"log.iostream": "stdout",
"k8s.pod.name": "kube-controller-kind-control-plane",
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
"k8s.container.name": "kube-controller",
"k8s.container.restart_count": "1",
"k8s.namespace.name": "some",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
}
```

</td>
</tr>
</table>

#### Parse the multiline as containerd container log and recombine into a single one

Configuration:
```yaml
- type: container
```

<table>
<tr><td> Input body </td> <td> Output body</td></tr>
<tr>
<td>

```json
{
"timestamp": "",
"body": "2023-06-22T10:27:25.813799277Z stdout P multiline containerd line that i",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
},
{
"timestamp": "",
"body": "2023-06-22T10:27:25.813799277Z stdout F s super awesomne",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
```

</td>
<td>

```json
{
"timestamp": "2023-06-22 10:27:25.813799277 +0000 UTC",
"body": "multiline containerd line that is super awesome",
"attributes": {
"time": "2023-06-22T10:27:25.813799277Z",
"logtag": "F",
"log.iostream": "stdout",
"k8s.pod.name": "kube-controller-kind-control-plane",
"k8s.pod.uid": "49cc7c1fd3702c40b2686ea7486091d6",
"k8s.container.name": "kube-controller",
"k8s.container.restart_count": "1",
"k8s.namespace.name": "some",
"log.file.path": "/var/log/pods/some_kube-controller-kind-control-plane_49cc7c1fd3702c40b2686ea7486091d6/kube-controller/1.log"
}
}
```

</td>
</tr>
</table>
28 changes: 28 additions & 0 deletions pkg/stanza/operator/helper/regexp.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
// Copyright The OpenTelemetry Authors
// SPDX-License-Identifier: Apache-2.0

package helper // import "github.com/open-telemetry/opentelemetry-collector-contrib/pkg/stanza/operator/helper"

import (
"fmt"
"regexp"
)

func MatchValues(value string, regexp *regexp.Regexp) (map[string]any, error) {
matches := regexp.FindStringSubmatch(value)
if matches == nil {
return nil, fmt.Errorf("regex pattern does not match")
}

parsedValues := map[string]any{}
for i, subexp := range regexp.SubexpNames() {
if i == 0 {
// Skip whole match
continue
}
if subexp != "" {
parsedValues[subexp] = matches[i]
}
}
return parsedValues, nil
}
Loading

0 comments on commit 90935ce

Please sign in to comment.