Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Charts' device value files override component security contexts #815

Open
2 of 17 tasks
eero-t opened this issue Feb 17, 2025 · 8 comments
Open
2 of 17 tasks

Charts' device value files override component security contexts #815

eero-t opened this issue Feb 17, 2025 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@eero-t
Copy link
Contributor

eero-t commented Feb 17, 2025

Several application values files change component securityContext, for example setting (container) root FS to be writable:

$ git grep -e '^[^#].*securityContext' -e ^[^#].*readOnlyRoot.*false | grep -v -e common/ -e templates
chatqna/gaudi-tgi-values.yaml:  securityContext:
chatqna/gaudi-tgi-values.yaml:    readOnlyRootFilesystem: false
chatqna/gaudi-vllm-values.yaml:  securityContext:
chatqna/gaudi-vllm-values.yaml:    readOnlyRootFilesystem: false
chatqna/guardrails-gaudi-values.yaml:  securityContext:
chatqna/guardrails-gaudi-values.yaml:    readOnlyRootFilesystem: false
searchqna/gaudi-values.yaml:  securityContext:
searchqna/gaudi-values.yaml:    readOnlyRootFilesystem: false

Those overrides are for tei & teirerank components, which seem to have same override for Gaudi, but I think that's wrong. Either component needs to write root FS or not, it should not depend on which device is used:

$ git grep ^[^#].*readOnlyRoot.*false | grep common/
common/agent/values.yaml:  readOnlyRootFilesystem: false
common/chathistory-usvc/values.yaml:  readOnlyRootFilesystem: false
common/data-prep/values.yaml:  readOnlyRootFilesystem: false
common/llm-uservice/values.yaml:  readOnlyRootFilesystem: false
common/lvm-serve/values.yaml:  readOnlyRootFilesystem: false
common/lvm-uservice/values.yaml:  readOnlyRootFilesystem: false
common/mm-embedding/values.yaml:  readOnlyRootFilesystem: false
common/prompt-usvc/values.yaml:  readOnlyRootFilesystem: false
common/retriever-usvc/values.yaml:  readOnlyRootFilesystem: false
common/speecht5/values.yaml:  readOnlyRootFilesystem: false
common/tei/gaudi-values.yaml:  readOnlyRootFilesystem: false        <====
common/teirerank/gaudi-values.yaml:  readOnlyRootFilesystem: false  <====
common/whisper/values.yaml:  readOnlyRootFilesystem: false

I'm pretty sure those overrides were made redundant with #613 and #642.


latest testing status for true readOnlyRootFileSystem settings of common

@poussa
Copy link
Collaborator

poussa commented Feb 18, 2025

Also, the vllm dropped the entire security context via #564.

Now the init container carefully manages the security context but the main container runs as root and with all capabilities.

@poussa poussa added the bug Something isn't working label Feb 18, 2025
@eero-t
Copy link
Contributor Author

eero-t commented Feb 18, 2025

UI and NGINX services are also missing user setting, i.e. running as root:

$ for i in $(ls common/*/values.yaml); do \
    if ! grep -q ^[^#]*sUser $i; then \
      echo "User missing: $i"; \
    fi;
  done
User missing: common/commonlib/values.yaml
User missing: common/nginx/values.yaml
User missing: common/ui/values.yaml
User missing: common/vllm/values.yaml

And this shows those same deployments also running with all the (default) capabilities):

$ for i in $(ls common/*/values.yaml); do \
    if ! grep -q ^[^#]*drop: $i; then \
      echo "Running with all capabilities: $i"; \
    fi; \
  done`
<same results>

Larger number of containers is running with (default) root group:

$ for i in $(ls common/*/*values*.yaml); do \
    if grep -q ^[^#]*sGroup $i; then \
      grep -H ^[^#]*sGroup $i; \
    elif echo $i | grep -qF /values.yaml; then \
      echo "User group missing: $i"; \
    fi; \
  done | grep missing
....

EDIT: groups are not in k8s security, so last one can be ignored.

@eero-t
Copy link
Contributor Author

eero-t commented Feb 18, 2025

In addition to disabling read-only FS setting, there are also some Helm charts that do not set it at all:

$ for i in $(ls common/*/*values*.yaml); do \
    if grep -q ^[^#]*readOnlyRootFilesystem $i; then \
      grep -H ^[^#]*readOnlyRootFilesystem $i; \
    elif echo $i | grep -qF /values.yaml; then \
      echo "No read-only FS setting: $i"; \
    fi; \
  done | grep -e ^No -e false | sort
common/agent/values.yaml:  readOnlyRootFilesystem: false
common/chathistory-usvc/values.yaml:  readOnlyRootFilesystem: false
common/data-prep/values.yaml:  readOnlyRootFilesystem: false
common/llm-uservice/values.yaml:  readOnlyRootFilesystem: false
common/lvm-serve/values.yaml:  readOnlyRootFilesystem: false
common/lvm-uservice/values.yaml:  readOnlyRootFilesystem: false
common/mm-embedding/values.yaml:  readOnlyRootFilesystem: false
common/prompt-usvc/values.yaml:  readOnlyRootFilesystem: false
common/retriever-usvc/values.yaml:  readOnlyRootFilesystem: false
common/speecht5/values.yaml:  readOnlyRootFilesystem: false
common/tei/gaudi-values.yaml:  readOnlyRootFilesystem: false
common/teirerank/gaudi-values.yaml:  readOnlyRootFilesystem: false
common/whisper/values.yaml:  readOnlyRootFilesystem: false
No read-only FS setting: common/commonlib/values.yaml
No read-only FS setting: common/gpt-sovits/values.yaml
No read-only FS setting: common/nginx/values.yaml
No read-only FS setting: common/text2image/values.yaml
No read-only FS setting: common/ui/values.yaml
No read-only FS setting: common/vllm/values.yaml

@lianhao
Copy link
Collaborator

lianhao commented Feb 18, 2025

Larger number of containers is running with (default) root group:

I don't think the k8s pod security standard ever mention any requirements for group settings, so we just run the container with the default group settings defined by the container image itself.

@lianhao
Copy link
Collaborator

lianhao commented Feb 18, 2025

In addition to disabling read-only FS setting, there are also some Helm charts that do not set it at all:

We need to double check if the new image can run with read-only FS now. It's set as false because when the helm chart was created, the container doesn't support running under read-only root FS by that time.

@eero-t
Copy link
Contributor Author

eero-t commented Feb 18, 2025

I was not suggesting changing it blindly. This is more of a reminder to go through these items.

And yes, groups are not mentioned k8s security standards, so that part can be ignored: https://kubernetes.io/docs/concepts/security/pod-security-standards/

@lianhao
Copy link
Collaborator

lianhao commented Feb 18, 2025

Also, the vllm dropped the entire security context via #564.

Now the init container carefully manages the security context but the main container runs as root and with all capabilities.

The vllm security settings are disabled by this commit, the reason behind this was at that time, opea/vllm-hpu image doesn't support run as non-root, while opea/vllm supported. Also, since there is no way for the helm to unset/delete values for the child chart, so we're stuck with this no security settings situation.

@eero-t
Copy link
Contributor Author

eero-t commented Feb 18, 2025

The vllm security settings are disabled by this commit, the reason behind this was at that time, opea/vllm-hpu image doesn't support run as non-root, while opea/vllm supported. Also, since there is no way for the helm to unset/delete values for the child chart, so we're stuck with this no security settings situation.

Then it should be changed to run as root (with comment indicating why), instead of dropping the whole security context. User 0 with the extra constraints is still fairly OK, without constraints, much less so.

@lianhao lianhao self-assigned this Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants