Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Co-scheduled pods never get scheduled #416

Closed
asm582 opened this issue Aug 22, 2022 · 16 comments
Closed

Co-scheduled pods never get scheduled #416

asm582 opened this issue Aug 22, 2022 · 16 comments

Comments

@asm582
Copy link

asm582 commented Aug 22, 2022

Hello,

I have a cluster with 10 nodes, I am co-scheduling 8 pods but they are always pending, below are the events seen in one of the pods:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  21m   scheduler-plugins-scheduler  0/10 nodes are available: 10 pre-filter pod ray-head-urm-exp--1-klbn8 cannot find enough sibling pods, current pods number: 1, minMember of group: 8.
  Warning  FailedScheduling  21m   scheduler-plugins-scheduler  0/10 nodes are available: 10 pod with pgName: default/ray-pg last failed in 3s, deny.
  Warning  FailedScheduling  20m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  19m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  15m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  13m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  19s   scheduler-plugins-scheduler  optimistic rejection in PostFilter

Co-scheduler version k8s.gcr.io/scheduler-plugins/kube-scheduler:v0.22.6

@asm582
Copy link
Author

asm582 commented Aug 22, 2022

Below message seems to be incorrect:

Warning FailedScheduling 3m6s scheduler-plugins-scheduler 0/10 nodes are available: 10 pod with pgName: default/ray-pg last failed in 3s, deny.

I only have 8 pods but it reports 10 pod with pgName in the above message?

@Huang-Wei
Copy link
Contributor

@asm582 can you try v0.23.10?

@Huang-Wei
Copy link
Contributor

I only have 8 pods but it reports 10 pod with pgName in the above message?

How many pods speciy pgName=rag-pg?

@asm582
Copy link
Author

asm582 commented Aug 23, 2022

I only have 8 pods but it reports 10 pod with pgName in the above message?

How many pods speciy pgName=rag-pg?

8 pods in a podgroup

@asm582
Copy link
Author

asm582 commented Aug 23, 2022

@Huang-Wei We see same issue with version 0.23.10

scheduler-plugins       default         1               2022-08-22 20:23:24.815778 -0400 EDT    deployed        scheduler-plugins-0.23.10   0.23.10  
Events:
  Type     Reason            Age    From                         Message
  ----     ------            ----   ----                         -------
  Warning  FailedScheduling  2m35s  scheduler-plugins-scheduler  0/10 nodes are available: 10 pre-filter pod ray-head-urm-exp--1-wp8fv cannot find enough sibling pods, current pods number: 1, minMember of group: 8.
  Warning  FailedScheduling  2m34s  scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  2m33s  scheduler-plugins-scheduler  optimistic rejection in PostFilter

@Huang-Wei
Copy link
Contributor

@asm582 could you describe the detailed reproducing steps from scratch?

@Huang-Wei
Copy link
Contributor

I only have 8 pods but it reports 10 pod with pgName in the above message?

That message means 10 nodes failed with reason "pod with pgName blabla".

@asm582
Copy link
Author

asm582 commented Aug 23, 2022

I only have 8 pods but it reports 10 pod with pgName in the above message?

That message means 10 nodes failed with reason "pod with pgName blabla".

ok, thanks for the confirmation, so the message has a bug it reports pods instead of nodes.

@Huang-Wei
Copy link
Contributor

well, it's not a bug, if you look at the regular message, they all follow the pattern:

0/100 nodes are available: 30 reason X, 20 reason Y, 50 reason Z. 

@asm582
Copy link
Author

asm582 commented Aug 23, 2022

well, it's not a bug, if you look at the regular message, they all follow the pattern:

0/100 nodes are available: 30 reason X, 20 reason Y, 50 reason Z. 

Got it, thanks! [node count] [reason message] is the format, I am a new user, and may it's just not obvious to me. adding something more descriptive could help I think

@asm582
Copy link
Author

asm582 commented Aug 26, 2022

@Huang-Wei I confirm that the performance issues are gone but there is an issue wrt to events that co-scheduler fires. if the cluster has all nodes tainted and pods do not have tolerations then co-scheduler generates a message : optimistic rejection in PostFilter where as the default scheduler prints out the correct message: 0/1 nodes are available: 1 node(s) had taint {key1: value1}, that the pod didn't tolerate., Would you agree this is an issue?

@Huang-Wei
Copy link
Contributor

Would you agree this is an issue?

More of a UX improvement. Why you saw it is b/c some of the pods in the PodGroup are "half-completed", i.e., scheduling constraints for them have been satisfied, so they're internally waiting for other sibling pods to be completed scheduling. However, other portion are not schedulable, so in the PostFilter, the optimization is to cancel waiting for those "half-completed" ones as they cannot succefuflly coscheduled, so release the resource they're holding is not a bad idea.

That's why you saw different msgs for different pods - the pods can be scheduled (i.e., only waiting for PodGroup constraint), and the pods cannot be scheduled (i.e., due to lack of resources).

@asm582
Copy link
Author

asm582 commented Aug 30, 2022

Would you agree this is an issue?

More of a UX improvement. Why you saw it is b/c some of the pods in the PodGroup are "half-completed", i.e., scheduling constraints for them have been satisfied, so they're internally waiting for other sibling pods to be completed scheduling. However, other portion are not schedulable, so in the PostFilter, the optimization is to cancel waiting for those "half-completed" ones as they cannot succefuflly coscheduled, so release the resource they're holding is not a bad idea.

That's why you saw different msgs for different pods - the pods can be scheduled (i.e., only waiting for PodGroup constraint), and the pods cannot be scheduled (i.e., due to lack of resources).

Thanks for coming back :)! No pods were actually scheduled. The Test minikube environment has only one node which is tainted, and the pods scheduled have no tolerations. In such a scenario co-scheduler does not create taint events. do you think the default scheduler and co-scheduler should fire the same event?

@Huang-Wei
Copy link
Contributor

Huang-Wei commented Aug 30, 2022

and the pods scheduled have no tolerations

How can pods without toleration be scheduled onto a tainted Node?

In such a scenario co-scheduler does not create taint events

I'm a bit confused. If the pods mapping to a PodGroup doesn't meet the quorum, it should report an error like:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  10s   scheduler-plugins-scheduler  0/1 nodes are available: 1 pre-filter pod pause-8b78cf79b-lrdwn cannot find enough sibling pods, current pods number: 1, minMember of group: 3.

While if the quorum is met, but other scheduling constraints (like res) are not met, it reports the specific scheduling error like:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  23s   scheduler-plugins-scheduler  0/1 nodes are available: 1 node(s) had taint {k1: v1}, that the pod didn't tolerate.

@asm582
Copy link
Author

asm582 commented Aug 30, 2022

ok, got it, thanks, I don't have the setup ready yet to reproduce the issue. I did learn new things from the comments in the post. Please feel free to close this issue and I can always re-open the issue later.

@Huang-Wei
Copy link
Contributor

Sure, feel free to reopen if you can reproduce or create issues for new topics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants