-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
argo workflow not detecting seldon deployment is available #668
Comments
The |
Hi, thanks for getting back so quick! The issue is flaky but I got the output of the Below the output of the log when the update ends up in the "kubectl.kubernetes.io/last-applied-configuration" field (snipped):
|
But are you sure the |
Yeah, How does the cluster-manager update the state, i.e., how are these steps executed:
I restarted cluster-manager and now the status field is added at the outer level of the deployment definition again, but strangely not in the
|
Actually sometimes we redeploy the cluster manager as well as part of the tests (i.e., helm delete - helm install), but some of the prediction services might still be running deployed by a previous instance of the cluster manager. Could the new cluster manager be caching the older deployment definitions somehow? (just thinking out loud here) |
That is an interesting case. It should be viable to restart the cluster-manager with no ill effects. |
You could try doing a ‘helm upgrade’ instead of deleting and installing again. Also you could add a ‘kubectl rollout status -n seldon-system statefulset/seldon-operator-controller-manager’ to make sure the cluster manager is up. |
Oh I see your point. I think if the CRD is removed then the deployments of that CRD will be too. You could add some ‘kubectl get sdep’ statements to check whether the CRD is still registered or removed and whether there are seldondeployments hanging around. |
Ok, I couldn't recreate this on my local cluster. You're right that all deployments get torn down when removing the CRD. We must be hitting some edge case on our CI/CD cluster. We'll start using helm upgrade see if it persists. We'll also start the process of moving to Seldon3. I'll report back if it shows up again. |
Please reopen if still an issue. |
Hi, we're having an issue with our argo workflow sometimes not detecting the 'status.state' == 'Available' condition in the seldon deployment resource definition. We're using version v0.2.7.
Below an abbreviate argo workflow definition, note the
successCondition: status.state == Available
condition.So when the workflow successfully detects that the deployment is available the workflow logs contain the following json (snipped):
Note the last line, on the outer level of the resource description json, there is a
"status"
key which contains thestate
with valueavailable
.Now let's look at the argo log output when the workflow does not detect that the deployment is available (snipped):
Note that the
{"status": {"state": "Available"}}
condition is actually in the description, however, it is no longer at the outer level, but part of the value of the"kubectl.kubernetes.io/last-applied-configuration"
key, which is actually a string (not json)So the question is why does the condition appear in different places some of the time.
The text was updated successfully, but these errors were encountered: