Co-scheduled pods never get scheduled #416

asm582 · 2022-08-22T15:02:51Z

Hello,

I have a cluster with 10 nodes, I am co-scheduling 8 pods but they are always pending, below are the events seen in one of the pods:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  21m   scheduler-plugins-scheduler  0/10 nodes are available: 10 pre-filter pod ray-head-urm-exp--1-klbn8 cannot find enough sibling pods, current pods number: 1, minMember of group: 8.
  Warning  FailedScheduling  21m   scheduler-plugins-scheduler  0/10 nodes are available: 10 pod with pgName: default/ray-pg last failed in 3s, deny.
  Warning  FailedScheduling  20m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  19m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  15m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  13m   scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  19s   scheduler-plugins-scheduler  optimistic rejection in PostFilter

Co-scheduler version k8s.gcr.io/scheduler-plugins/kube-scheduler:v0.22.6

The text was updated successfully, but these errors were encountered:

asm582 · 2022-08-22T16:04:50Z

Below message seems to be incorrect:

Warning FailedScheduling 3m6s scheduler-plugins-scheduler 0/10 nodes are available: 10 pod with pgName: default/ray-pg last failed in 3s, deny.

I only have 8 pods but it reports 10 pod with pgName in the above message?

Huang-Wei · 2022-08-22T22:54:44Z

@asm582 can you try v0.23.10?

Huang-Wei · 2022-08-23T00:01:14Z

I only have 8 pods but it reports 10 pod with pgName in the above message?

How many pods speciy pgName=rag-pg?

asm582 · 2022-08-23T00:17:15Z

I only have 8 pods but it reports 10 pod with pgName in the above message?

How many pods speciy pgName=rag-pg?

8 pods in a podgroup

asm582 · 2022-08-23T00:30:22Z

@Huang-Wei We see same issue with version 0.23.10

scheduler-plugins       default         1               2022-08-22 20:23:24.815778 -0400 EDT    deployed        scheduler-plugins-0.23.10   0.23.10

Events:
  Type     Reason            Age    From                         Message
  ----     ------            ----   ----                         -------
  Warning  FailedScheduling  2m35s  scheduler-plugins-scheduler  0/10 nodes are available: 10 pre-filter pod ray-head-urm-exp--1-wp8fv cannot find enough sibling pods, current pods number: 1, minMember of group: 8.
  Warning  FailedScheduling  2m34s  scheduler-plugins-scheduler  optimistic rejection in PostFilter
  Warning  FailedScheduling  2m33s  scheduler-plugins-scheduler  optimistic rejection in PostFilter

Huang-Wei · 2022-08-23T00:38:45Z

@asm582 could you describe the detailed reproducing steps from scratch?

Huang-Wei · 2022-08-23T00:42:02Z

I only have 8 pods but it reports 10 pod with pgName in the above message?

That message means 10 nodes failed with reason "pod with pgName blabla".

asm582 · 2022-08-23T00:57:47Z

I only have 8 pods but it reports 10 pod with pgName in the above message?

That message means 10 nodes failed with reason "pod with pgName blabla".

ok, thanks for the confirmation, so the message has a bug it reports pods instead of nodes.

Huang-Wei · 2022-08-23T01:03:01Z

well, it's not a bug, if you look at the regular message, they all follow the pattern:

0/100 nodes are available: 30 reason X, 20 reason Y, 50 reason Z.

asm582 · 2022-08-23T01:05:36Z

well, it's not a bug, if you look at the regular message, they all follow the pattern:
0/100 nodes are available: 30 reason X, 20 reason Y, 50 reason Z. 

Got it, thanks! [node count] [reason message] is the format, I am a new user, and may it's just not obvious to me. adding something more descriptive could help I think

asm582 · 2022-08-26T15:16:00Z

@Huang-Wei I confirm that the performance issues are gone but there is an issue wrt to events that co-scheduler fires. if the cluster has all nodes tainted and pods do not have tolerations then co-scheduler generates a message : optimistic rejection in PostFilter where as the default scheduler prints out the correct message: 0/1 nodes are available: 1 node(s) had taint {key1: value1}, that the pod didn't tolerate., Would you agree this is an issue?

Huang-Wei · 2022-08-29T21:09:28Z

Would you agree this is an issue?

More of a UX improvement. Why you saw it is b/c some of the pods in the PodGroup are "half-completed", i.e., scheduling constraints for them have been satisfied, so they're internally waiting for other sibling pods to be completed scheduling. However, other portion are not schedulable, so in the PostFilter, the optimization is to cancel waiting for those "half-completed" ones as they cannot succefuflly coscheduled, so release the resource they're holding is not a bad idea.

That's why you saw different msgs for different pods - the pods can be scheduled (i.e., only waiting for PodGroup constraint), and the pods cannot be scheduled (i.e., due to lack of resources).

asm582 · 2022-08-30T16:49:30Z

Would you agree this is an issue?

More of a UX improvement. Why you saw it is b/c some of the pods in the PodGroup are "half-completed", i.e., scheduling constraints for them have been satisfied, so they're internally waiting for other sibling pods to be completed scheduling. However, other portion are not schedulable, so in the PostFilter, the optimization is to cancel waiting for those "half-completed" ones as they cannot succefuflly coscheduled, so release the resource they're holding is not a bad idea.

That's why you saw different msgs for different pods - the pods can be scheduled (i.e., only waiting for PodGroup constraint), and the pods cannot be scheduled (i.e., due to lack of resources).

Thanks for coming back :)! No pods were actually scheduled. The Test minikube environment has only one node which is tainted, and the pods scheduled have no tolerations. In such a scenario co-scheduler does not create taint events. do you think the default scheduler and co-scheduler should fire the same event?

Huang-Wei · 2022-08-30T17:08:42Z

and the pods scheduled have no tolerations

How can pods without toleration be scheduled onto a tainted Node?

In such a scenario co-scheduler does not create taint events

I'm a bit confused. If the pods mapping to a PodGroup doesn't meet the quorum, it should report an error like:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  10s   scheduler-plugins-scheduler  0/1 nodes are available: 1 pre-filter pod pause-8b78cf79b-lrdwn cannot find enough sibling pods, current pods number: 1, minMember of group: 3.

While if the quorum is met, but other scheduling constraints (like res) are not met, it reports the specific scheduling error like:

Events:
  Type     Reason            Age   From                         Message
  ----     ------            ----  ----                         -------
  Warning  FailedScheduling  23s   scheduler-plugins-scheduler  0/1 nodes are available: 1 node(s) had taint {k1: v1}, that the pod didn't tolerate.

asm582 · 2022-08-30T19:31:10Z

ok, got it, thanks, I don't have the setup ready yet to reproduce the issue. I did learn new things from the comments in the post. Please feel free to close this issue and I can always re-open the issue later.

Huang-Wei · 2022-08-30T22:07:39Z

Sure, feel free to reopen if you can reproduce or create issues for new topics.

Huang-Wei closed this as completed Aug 30, 2022

asm582 mentioned this issue Nov 21, 2023

REQUEST: New membership for asm582 kubernetes/org#4594

Closed

9 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Co-scheduled pods never get scheduled #416

Co-scheduled pods never get scheduled #416

asm582 commented Aug 22, 2022

asm582 commented Aug 22, 2022

Huang-Wei commented Aug 22, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022

asm582 commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022 •

edited

Loading

asm582 commented Aug 26, 2022

Huang-Wei commented Aug 29, 2022

asm582 commented Aug 30, 2022

Huang-Wei commented Aug 30, 2022 •

edited

Loading

asm582 commented Aug 30, 2022

Huang-Wei commented Aug 30, 2022

Co-scheduled pods never get scheduled #416

Co-scheduled pods never get scheduled #416

Comments

asm582 commented Aug 22, 2022

asm582 commented Aug 22, 2022

Huang-Wei commented Aug 22, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022

asm582 commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022

Huang-Wei commented Aug 23, 2022

asm582 commented Aug 23, 2022 • edited Loading

asm582 commented Aug 26, 2022

Huang-Wei commented Aug 29, 2022

asm582 commented Aug 30, 2022

Huang-Wei commented Aug 30, 2022 • edited Loading

asm582 commented Aug 30, 2022

Huang-Wei commented Aug 30, 2022

asm582 commented Aug 23, 2022 •

edited

Loading

Huang-Wei commented Aug 30, 2022 •

edited

Loading