Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filter by namespace intermittently includes all namespaces #388

Open
abean-work opened this issue Nov 29, 2024 · 2 comments
Open

Filter by namespace intermittently includes all namespaces #388

abean-work opened this issue Nov 29, 2024 · 2 comments
Labels
bug Something isn't working no-repro

Comments

@abean-work
Copy link




Description:
When attempting to run Popeye against a namespace, it will intermittently (around half the time) fail due to problems in other namespaces.

To Reproduce
Steps to reproduce the behavior:

  1. Deploy a pod that will cause Popeye to fail: kubectl run fail-pod --image=nonexistent/nonexistentimage:latest -n test
  2. Scan a different namespace that is healthy: popeye -n healthy -l error -f ./spinach.yml
  3. Repeat the scan until it fails.
  • Most scans will return healthy with no issues, e.g.:
    PODS (5 SCANNED)                                                             💥 0 😱 0 🔊 0 ✅ 5 100٪
    
    ┅┅┅┅┅┅┅
    · Nothing to report.
    
  • Occasionally, it will fail due to the pod in the other namespace (notice a lot more pods are included in the scan):
    PODS (30 SCANNED)                                                            💥 1 😱 0 🔊 0 ✅ 29 96٪
    ┅┅┅┅┅┅┅
    · test/fail-pod...............................................................................💥
      💥 [POP-207] Pod is in an unhappy phase (Pending).
      🐳 fail-pod
        💥 [POP-203] Pod is waiting [0/1] ImagePullBackOff.
    

Using the following (crude) command, I was able to reproduce the error easily:

> repeat 20 { popeye -n healthy -l error -f ./spinach.yml > /dev/null 2>&1; echo $?}
1
0
0
1
0
0
1
0
0
1
1
0
0
0
0
1
1
1
0
1

The exit codes show that, in this instance, 9 out of 20 scans failed due to including resources from other namespaces. When repeating this command, the number of failures has always been between 8 and 12, so roughly half the time it fails.

Expected behavior

  1. The namespace flag should restrict the popeye scan to that namespace.
  2. Scans are consistent in the resources they include.

Versions (please complete the following information):

  • OS: OSX 14.7 and Ubuntu 22.04
  • Popeye: 0.21.5
  • K8s: 1.29.8

Additional context

Our team owns/manages a number of namespaces on shared Kubernetes (AKS) clusters, which we are scanning individually using the -n flag and then aggregating the JUnit output.

These namespaces are looped through, so the scans happen immediately after one another. I've tried adding sleeps between scans, but this didn't help.

This could be related to #314, but I've created a new issue as it does work some of the time.

Spinach config:
---
# Popeye configuration using the AKS sample as a base.
# See: https://github.com/derailed/popeye/blob/master/spinach/spinach_aks.yml
popeye:
  allocations:
    cpu:
      # Checks if cpu is under allocated by more than x% at current load.
      underPercUtilization: 200
      # Checks if cpu is over allocated by more than x% at current load.
      overPercUtilization: 50
    memory:
      # Checks if mem is under allocated by more than x% at current load.
      underPercUtilization: 200
      # Checks if mem is over allocated by more than x% at current load.
      overPercUtilization: 50

  # Excludes define rules to exempt resources from sanitization
  excludes:
    global:
      fqns:
        # Exclude kube-system namespace
        - rx:^kube-system/

    linters:
      # Exclude system CRBs
      clusterrolebindings:
        instances:
          - fqns:
              - rx:^aks
              - rx:^omsagent
              - rx:^system

      # Exclude system CRs
      clusterroles:
        instances:
          - fqns:
              - rx:^system
              - admin
              - cluster-admin
              - edit
              - omsagent-reader
              - view
            codes: [400]

      # Exclude unused windows daemonset
      daemonsets:
        instances:
          - fqns: [calico-system/calico-windows-upgrade]
            codes: [508]

      # Exclude due to intermittent false positives
      serviceaccounts:
        codes: ["305"]

  resources:
    # Nodes specific sanitization
    node:
      limits:
        cpu: 90
        memory: 80

    # Pods specific sanitization
    pod:
      limits:
        # Fail if cpu is over x%
        # Set intentionally high to ignore (if you comment it out, it'll default to 80)
        cpu: 250
        # Set intentionally high to ignore (if you comment it out, it'll default to 90)
        # Fail if pod mem is over x%
        memory: 900
      # Fail if more than x restarts on any pods
      restarts: 3
@derailed
Copy link
Owner

@abean-work Thank you for this great report! I can't seem to repro this using the latest on any of my clusters ;(
The script above consistently reports 0 exit code as expected. I did tune a few things tho.
Let see if we are happier on the next push, if not we will need to investigate further...

@derailed derailed added bug Something isn't working no-repro labels Dec 29, 2024
@abean-work
Copy link
Author

abean-work commented Jan 3, 2025

@derailed thanks for your response. I've tested the repro steps with the latest version of Popeye, but I'm encountering a different issue now.

When I use the -n flag, the scan output only includes non-namespaced resources, regardless of the namespace specified (even if the namespace doesn’t exist). Additionally, if I specify a section (e.g., pods or configmaps) using the --sections flag, I get the following error:

> popeye -n test --sections po
 ___     ___ _____   _____                       D          .-'-.
| _ \___| _ \ __\ \ / / __|                       O     __| K    `\
|  _/ _ \  _/ _| \ V /| _|                         H   `-,-`--._   `\
|_| \___/_| |___| |_| |___|                       []  .->'  X     `|-'
  Biffs`em and Buffs`em!                            `=/ (__/_       /
                                                      \_,    `    _)
                                                         `----;  |


Boom! 💥 no linters matched query. check section selector (see logs)
Debug logs:
4:30PM ERR 💥 no linters matched query. check section selector
4:30PM ERR goroutine 1 [running]:
runtime/debug.Stack()
	runtime/debug/stack.go:26 +0x64
github.com/derailed/popeye/pkg.BailOut({0x105928a68, 0x14000c2e530})
	github.com/derailed/popeye/pkg/helpers.go:24 +0x10c
github.com/derailed/popeye/cmd.doIt.func1()
	github.com/derailed/popeye/cmd/root.go:59 +0x94
panic({0x105475ce0?, 0x14000c2e530?})
	runtime/panic.go:785 +0x124
github.com/derailed/popeye/cmd.bomb({0x105928a68?, 0x14000c2e480?})
	github.com/derailed/popeye/cmd/root.go:88 +0xd0
github.com/derailed/popeye/cmd.doIt(0x140000f8700?, {0x104c48b56?, 0x4?, 0x104c48b5a?})
	github.com/derailed/popeye/cmd/root.go:74 +0xdc
github.com/spf13/cobra.(*Command).execute(0x1071126c0, {0x140001c6010, 0x8, 0x8})
	github.com/spf13/cobra@v1.8.1/command.go:989 +0x81c
github.com/spf13/cobra.(*Command).ExecuteC(0x1071126c0)
	github.com/spf13/cobra@v1.8.1/command.go:1117 +0x344
github.com/spf13/cobra.(*Command).Execute(...)
	github.com/spf13/cobra@v1.8.1/command.go:1041
github.com/derailed/popeye/cmd.Execute(...)
	github.com/derailed/popeye/cmd/root.go:48
main.main()
	github.com/derailed/popeye/main.go:12 +0x28

However, if I use the -A flag, the scan includes all resources across all namespaces as expected.

This seems to be unrelated to the current issue. Would you prefer I open a new issue for this behaviour?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working no-repro
Projects
None yet
Development

No branches or pull requests

2 participants