Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Auditbeat 8.3.0-SNAPSHOT is failing on GKE: "failed to get audit status" #31616

Closed
barkbay opened this issue May 12, 2022 · 8 comments · Fixed by #31710 or #32141
Closed

Auditbeat 8.3.0-SNAPSHOT is failing on GKE: "failed to get audit status" #31616

barkbay opened this issue May 12, 2022 · 8 comments · Fixed by #31710 or #32141
Assignees

Comments

@barkbay
Copy link
Contributor

barkbay commented May 12, 2022

Hello 👋 ,

The ECK project deploys Auditbeat as part of its E2E tests suite. Today we noticed that a test which validates that snapshot builds are working as expected is failing for Auditbeat 8.3.0-SNAPSHOT.

The error is:

{
	"log.level": "error",
	"@timestamp": "2022-05-12T02:26:18.269Z",
	"log.origin": {
		"file.name": "instance/beat.go",
		"file.line": 1039
	},
	"message": "Exiting: 1 error: failed to create audit client: failed to get audit status: failed to unmarshal reply: unexpected EOF",
	"service.name": "auditbeat",
	"ecs.version": "1.6.0"
}

Out of curiosity I built my own version of go-libaudit to dump the content of the reply:

diff --git a/audit.go b/audit.go
index 0c528e1..f077726 100644
--- a/audit.go
+++ b/audit.go
@@ -156,7 +156,7 @@ func (c *AuditClient) GetStatus() (*AuditStatus, error) {

        replyStatus := &AuditStatus{}
        if err := replyStatus.FromWireFormat(reply.Data); err != nil {
-               return nil, fmt.Errorf("failed to unmarshal reply: %w", err)
+               return nil, fmt.Errorf("failed to unmarshal reply: %w, reply is %+v", err, reply)
        }

        return replyStatus, nil

Here is the result:

Exiting: 1 error: failed to create audit client: failed to get audit status: failed to unmarshal reply: unexpected EOF, reply is &{Header:{Len:56 Type:1000 Flags:0 Seq:1 Pid:0} Data:[0 0 0 0 1 0 0 0 1 0 0 0 196 0 0 0 0 0 0 0 128 0 0 0 0 0 0 0 0 0 0 0 127 0 0 0 96 234 0 0]}

I also tested different versions of Auditbeat from specific git commits and I concluded that Auditbeat works as expected until this PR: #31519

For confirmed bugs, please report:

  • Version: 8.3.0-SNAPSHOT
  • Operating System: GKE v1.20.15-gke.6000
  • Steps to Reproduce: I still need to figure out if some security constraints (PSP) may affect the test, but my feeling is that deploying the latest snapshot of Auditbeat with ECK should be enough to trigger the problem.
Please find the configuration here
auditbeat:
  modules:
  - exclude_files:
    - (?i)\.sw[nop]$
    - ~$
    - /\.git($|/)
    hash_types:
    - sha1
    max_file_size: 100 MiB
    module: file_integrity
    paths:
    - /hostfs/bin
    - /hostfs/usr/bin
    - /hostfs/sbin
    - /hostfs/usr/sbin
    - /hostfs/etc
    recursive: true
    scan_at_start: true
    scan_rate_per_sec: 50 MiB
  - audit_rules: |
      # Executions
      -a always,exit -F arch=b64 -S execve,execveat -k exec

      # Unauthorized access attempts (adjusted to be compatible with amd64 and arm64)
      -a always,exit -F arch=b64 -S truncate,ftruncate,openat,open_by_handle_at -F exit=-EACCES -k access
      -a always,exit -F arch=b64 -S truncate,ftruncate,openat,open_by_handle_at -F exit=-EPERM -k access
    module: auditd
output:
  elasticsearch:
    hosts:
    - https://test-ab-cfg-gq45-es-http.e2e-mercury.svc:9200
    password: REDACTED
    ssl:
      certificate_authorities:
      - /mnt/elastic-internal/elasticsearch-certs/ca.crt
    username: e2e-mercury-test-ab-cfg-btgj-beat-user
processors:
- add_cloud_metadata: null
- add_process_metadata:
    match_pids:
    - process.pid
- add_kubernetes_metadata:
    default_indexers:
      enabled: false
    default_matchers:
      enabled: false
    indexers:
    - container: null
    matchers:
    - fields:
        lookup_fields:
        - container.id
    node: ${NODE_NAME}
setup:
  dashboards:
    enabled: true
  kibana:
    host: https://test-ab-cfg-4cv4-kb-http.e2e-mercury.svc:5601
    password: REDACTED
    ssl:
      certificate_authorities:
      - /mnt/elastic-internal/kibana-certs/ca.crt
    username: e2e-mercury-test-ab-cfg-btgj-beat-kb-user
And here is the Daemonset (built by the ECK operator) used to deploy Auditbeat
apiVersion: apps/v1
kind: DaemonSet
metadata:
  annotations:
    deprecated.daemonset.template.generation: "1"
  creationTimestamp: "2022-05-12T15:44:12Z"
  generation: 1
  labels:
    beat.k8s.elastic.co/name: test-ab-cfg-btgj
    common.k8s.elastic.co/template-hash: "2827389728"
    common.k8s.elastic.co/type: beat
  name: test-ab-cfg-btgj-beat-auditbeat
  namespace: e2e-mercury
  ownerReferences:
  - apiVersion: beat.k8s.elastic.co/v1beta1
    blockOwnerDeletion: true
    controller: true
    kind: Beat
    name: test-ab-cfg-btgj
    uid: 2f4701e9-3031-4805-a7ea-8006b2213601
  resourceVersion: "104412"
  uid: 6b6acbf2-a8f2-4f65-8157-dc3660d3d097
spec:
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      beat.k8s.elastic.co/name: test-ab-cfg-btgj
      common.k8s.elastic.co/type: beat
  template:
    metadata:
      annotations:
        beat.k8s.elastic.co/config-hash: "2575411752"
      creationTimestamp: null
      labels:
        beat.k8s.elastic.co/name: test-ab-cfg-btgj
        beat.k8s.elastic.co/version: 8.3.0-SNAPSHOT
        common.k8s.elastic.co/type: beat
    spec:
      automountServiceAccountToken: true
      containers:
      - args:
        - -e
        - -c
        - /etc/beat.yml
        env:
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              apiVersion: v1
              fieldPath: spec.nodeName
        image: docker.elastic.co/beats/auditbeat:8.3.0-SNAPSHOT
        imagePullPolicy: IfNotPresent
        name: auditbeat
        resources:
          limits:
            cpu: 100m
            memory: 200Mi
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          capabilities:
            add:
            - AUDIT_READ
            - AUDIT_WRITE
            - AUDIT_CONTROL
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
        volumeMounts:
        - mountPath: /usr/share/auditbeat/data
          name: beat-data
        - mountPath: /hostfs/bin
          name: bin
          readOnly: true
        - mountPath: /etc/beat.yml
          name: config
          readOnly: true
          subPath: beat.yml
        - mountPath: /mnt/elastic-internal/elasticsearch-certs
          name: elasticsearch-certs
          readOnly: true
        - mountPath: /hostfs/etc
          name: etc
          readOnly: true
        - mountPath: /mnt/elastic-internal/kibana-certs
          name: kibana-certs
          readOnly: true
        - mountPath: /run/containerd
          name: run-containerd
          readOnly: true
        - mountPath: /hostfs/sbin
          name: sbin
          readOnly: true
        - mountPath: /hostfs/usr/bin
          name: usrbin
          readOnly: true
        - mountPath: /hostfs/usr/sbin
          name: usrsbin
          readOnly: true
      dnsPolicy: ClusterFirstWithHostNet
      hostNetwork: true
      hostPID: true
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext:
        runAsUser: 0
      serviceAccount: test-ab-cfg-btgj-sa
      serviceAccountName: test-ab-cfg-btgj-sa
      terminationGracePeriodSeconds: 30
      volumes:
      - hostPath:
          path: /var/lib/e2e-mercury/test-ab-cfg-btgj/auditbeat-data
          type: DirectoryOrCreate
        name: beat-data
      - hostPath:
          path: /bin
          type: ""
        name: bin
      - name: config
        secret:
          defaultMode: 292
          optional: false
          secretName: test-ab-cfg-btgj-beat-auditbeat-config
      - name: elasticsearch-certs
        secret:
          defaultMode: 420
          optional: false
          secretName: test-ab-cfg-btgj-beat-es-ca
      - hostPath:
          path: /etc
          type: ""
        name: etc
      - name: kibana-certs
        secret:
          defaultMode: 420
          optional: false
          secretName: test-ab-cfg-btgj-beat-kibana-ca
      - hostPath:
          path: /run/containerd
          type: DirectoryOrCreate
        name: run-containerd
      - hostPath:
          path: /sbin
          type: ""
        name: sbin
      - hostPath:
          path: /usr/bin
          type: ""
        name: usrbin
      - hostPath:
          path: /usr/sbin
          type: ""
        name: usrsbin
  updateStrategy:
    rollingUpdate:
      maxUnavailable: 1
    type: RollingUpdate
status:
  currentNumberScheduled: 3
  desiredNumberScheduled: 3
  numberAvailable: 3
  numberMisscheduled: 0
  numberReady: 3
  observedGeneration: 1
  updatedNumberScheduled: 3

Please, let me know if you need additional details.

Thanks

@elasticmachine
Copy link
Collaborator

Pinging @elastic/security-external-integrations (Team:Security-External Integrations)

@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label May 12, 2022
@efd6 efd6 self-assigned this May 16, 2022
@efd6
Copy link
Contributor

efd6 commented May 16, 2022

@barkbay The issue is that the buffer has data for 10 uint32 fields, but libaudit.AuditStatus now has 11 fields (since elastic/go-libaudit@8189891). Can you let me know what kernel version is being run on the host?

@barkbay
Copy link
Contributor Author

barkbay commented May 16, 2022

Can you let me know what kernel version is being run on the host?

Sure:

michael@gke-michael-dev1-default-pool-1658bc14-lbg9 ~ $ uname -a
Linux gke-michael-dev1-default-pool-1658bc14-lbg9 5.4.170+ #1 SMP Sat Apr 2 10:06:05 PDT 2022 x86_64 Intel(R) Xeon(R) CPU @ 2.30GHz GenuineIntel GNU/Linux

@efd6
Copy link
Contributor

efd6 commented May 16, 2022

Thanks, so it looks like that predates the addition of backlog_wait_time_actual.

If we can't rely on a kernel version that supports audit_status->backlog_wait_time_actual then, since we know that we do work on the previous version, we could have a weaker test that would only error out when missing the anything other backlog_wait_time_actual field (back to 40 bytes). We would then just return that field set to some reasonable sentinel value; zero seems fine.

Does that sound reasonable to you?

@efd6
Copy link
Contributor

efd6 commented May 16, 2022

Actually, based on the docs for the unmarshal function, we can just remove the length check.

@liza-mae
Copy link

Also seeing this failure on 8.3.0-SNAPSHOT tar package.

{"log.level":"error","@timestamp":"2022-05-23T19:33:15.214Z","log.origin":{"file.name":"instance/beat.go","file.line":1041},"message":"Exiting: 1 error: failed to create audit client: failed to get audit status: failed to unmarshal reply: unexpected EOF","service.name":"auditbeat","ecs.version":"1.6.0"}

@efd6
Copy link
Contributor

efd6 commented May 23, 2022

@liza-mae Yes, that's expected. The fix is merged, so the next snapshot and upcoming release should be fixed.

@christophercutajar
Copy link

We're experiencing the same issue after upgrading from 8.2.3 to 8.3.0 on one of our Linux boxes.

uname -a: Linux <hostname> 3.10.0-1160.66.1.el7.x86_64 #1 SMP Wed May 18 16:02:34 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

{"log.level":"error","@timestamp":"2022-06-28T21:55:44.353Z","log.origin":{"file.name":"instance/beat.go","file.line":1051},"message":"Exiting: 1 error: failed to create audit client: failed to get audit status: failed to unmarshal reply: unexpected EOF","service.name":"auditbeat","ecs.version":"1.6.0"}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants