You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
After the pod is scheduled to the node, when the allocated resources of the node decrease due to some reasons (such as an exception reported by the gpu device), the node will be set to outofsync state, and node will not be added to the session, new pod cannot be scheduled to the current node until the allocatable resources reported by the node become normal ,even if there are other idle resources on the node, the pod cannot be scheduled. If the pod is used to report gpu resources, the premise of the pod being scheduled is that the node ends the outofsync state , and the end of outofsync requires the gpu device to be scheduled and report resources correctly, which causes a deadlock What you expected to happen:
pod used to report gpu resource should be scheduled even though node is in outOfSync state. How to reproduce it (as minimally and precisely as possible):
Run a device-plugin daemonset to report gpu resource
Run a pod using gpu resource
Uninstall device-plugin daemonset and wait gpu resource of node allocatable become zero
Re-deploy device-plugin daemonset, one daemonset pod can't be scheduled
Anything else we need to know?:
Environment:
Volcano Version: latest
Kubernetes version (use kubectl version):
Cloud provider or hardware configuration:
OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Install tools:
Others:
The text was updated successfully, but these errors were encountered:
What happened:
After the pod is scheduled to the node, when the allocated resources of the node decrease due to some reasons (such as an exception reported by the gpu device), the node will be set to outofsync state, and node will not be added to the session, new pod cannot be scheduled to the current node until the allocatable resources reported by the node become normal ,even if there are other idle resources on the node, the pod cannot be scheduled. If the pod is used to report gpu resources, the premise of the pod being scheduled is that the node ends the outofsync state , and the end of outofsync requires the gpu device to be scheduled and report resources correctly, which causes a deadlock
What you expected to happen:
pod used to report gpu resource should be scheduled even though node is in outOfSync state.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
):uname -a
):The text was updated successfully, but these errors were encountered: