-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PNS Executor Doesn't Work Occasionally #2671
Comments
I realized my assumption is wrong. The PNS executor waits on initializing init containers. However, init containers don't exist in a specific situation. |
I got it. Init containers have nothing to do with the PNS executor.
Is it expected behavior? |
looks like // WaitInit is called before Wait() to signal the executor about an impending Wait call. |
I think the go function doesn't wait for the main container. |
I've encountered this problem now even when storing outputs in emptyDirs. The problem for me isn't actually saving the outputs, it says it successfully saved the outputs. Here's the wait container logs.
The reason on the workflow does state it couldn't save outputs, but this doesn't seem right given the logs above.
It looks like the issue is at https://github.com/argoproj/argo/blob/master/workflow/executor/pns/pns.go#L333 . For some reason, intermittently, the Stat is failing with "permission denied". Then the next time checking in the map it still isn't found, so the same error that happens when outputs can't be saved gets thrown, but the real issue in my case is intermittent permission denied.... If it helps, I'm running on microk8s locally on ubuntu. All permissions are set up normally on a microk8s install, my user is in the microk8s group, the pod I'm launching does not have a securityContext set. Also, I'm running version 2.7.1. I can upgrade and see if that helps. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
i'm still hitting this issue. |
I'm getting this with |
I'm also having similar issue |
PNS executor needs to run-as-root. Check that? |
It does not if you are on containerd or cri-o and modify the executor image a bit. Docker has some bugs regarding capabilities without root afaik. I am using it rootless with pod security policies. Do you want me to make a pull request with the necessary changes? It depends a bit on |
@juliusvonkohout can I get you to add your security policy to #4186 as the first step so we can understand the solution? |
@wreed @juliusvonkohout I've noted that when ones run PNS with |
Checklist:
What happened:
I set up v2.7.2 in minikube with the PNS executor, but found output-parameter.yaml doesn't work because the wait container cannot get PID of the main container. I'm not sure if it only occurs in minikube.
I checked the Kubernetes events of the pod and found image-pulling of the main container took longer than the wait container. I guess this is the reason why the executor cannot get the PID of the main container.
I think the deadline in the following code should be configurable.
https://github.com/argoproj/argo/blob/79217bc89e892ee82bdd5018b2bba65425924d36/workflow/executor/pns/pns.go#L149
I will make a fix soon.
What you expected to happen:
output-parameter.yaml can be executed successfully.
How to reproduce it (as minimally and precisely as possible):
Run output-parameter.yaml with v2.7.2 in minikube.
Anything else we need to know?:
N/A
Environment:
Other debugging information (if applicable):
logs
logs
Kubernetes pod events from
kubectl describe
.Message from the maintainers:
If you are impacted by this bug please add a 👍 reaction to this issue! We often sort issues this way to know what to prioritize.
The text was updated successfully, but these errors were encountered: