No meaningful status if gang-scheduled job has insufficient resources #447

jeefy · 2018-10-17T14:20:06Z

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:
When submitting a gang-scheduled job, if there are no resources available to schedule it there is no meaningful feedback to a user.

The podGroup has no events that they can't start.
The job says that it's running, with no errors or messages.
The pods themselves are created but not started.

The only place to find out that there's a scheduling error is by looking at the kube-batchd logs.

What you expected to happen:
A message/event either within the job or the podGroup stating that there are insufficient resources to schedule this job.

How to reproduce it (as minimally and precisely as possible):
Run the example job with a standard minikube start

Anything else we need to know?:
I'm probably going to be diving into this a lot more so Hi! :)

Environment:
master branch

minikube v0.30.0 on OSX

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:36:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}

The text was updated successfully, but these errors were encountered:

jeefy · 2018-10-17T15:22:59Z

I hit the wrong button derp
/reopen

k8s-ci-robot · 2018-10-17T15:23:06Z

@jeefy: Reopening this issue.

In response to this:

I hit the wrong button derp
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k82cn · 2018-10-18T11:10:04Z

xref #401

When submitting a gang-scheduled job, if there are no resources available to schedule it there is no meaningful feedback to a user.

Which version are you testing? In master, we record an "Unschedulable" event if not enough resources for the job in each scheduling cycle.

jeefy · 2018-10-18T18:41:11Z

This is off of master. (Sorry this is long)

I'm following the directions in the Tutorial (more or less)

Dump of commands/output below

Set up environment

minikube start --kubernetes-version v1.12.1 --cpus 4 --memory 8192  && \
kubectl create serviceaccount tiller --namespace kube-system && \
kubectl create -f tiller-crb.yaml && \
helm init --service-account tiller --upgrade && \
sleep 30 && \
helm install $GOPATH/src/github.com/kubernetes-sigs/kube-batch/deployment/kube-batch --namespace kube-system

(Helm) output

Jeffreys-MacBook-Pro:jeefy jeefy$ helm status filled-umbrellabird
LAST DEPLOYED: Thu Oct 18 14:29:07 2018
NAMESPACE: kube-system
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/CustomResourceDefinition
NAME                                   AGE
podgroups.scheduling.incubator.k8s.io  39s
queues.scheduling.incubator.k8s.io     35s

==> v1/Deployment
NAME        DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
kube-batch  1        1        1           1          35s

==> v1/Pod(related)
NAME                         READY  STATUS   RESTARTS  AGE
kube-batch-659d977bb7-zxm55  1/1    Running  0         35s


NOTES:
The batch scheduler of Kubernetes.

Creating a job using the example job-01.yaml, however requiring "2000m" CPU

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl create -f job-01.yaml
job.batch/qj-1 created
podgroup.scheduling.incubator.k8s.io/qj-1 created

Describing the job and the podGroup

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl describe job qj-1
Name:           qj-1
Namespace:      default
Selector:       controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
Labels:         controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
                job-name=qj-1
Annotations:    <none>
Parallelism:    6
Completions:    6
Start Time:     Thu, 18 Oct 2018 14:30:31 -0400
Pods Statuses:  6 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
                job-name=qj-1
  Annotations:  scheduling.k8s.io/group-name=qj-1
  Containers:
   busybox:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Requests:
      cpu:        2
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-sxnf4
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-p6qrv
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-gwzvz
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-slwbq
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-jqxln
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-vzltr
Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl describe podgroups.scheduling.incubator.k8s.io qj-1
Name:         qj-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.incubator.k8s.io/v1alpha1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2018-10-18T18:30:31Z
  Generation:          1
  Resource Version:    613
  Self Link:           /apis/scheduling.incubator.k8s.io/v1alpha1/namespaces/default/podgroups/qj-1
  UID:                 e508c7b2-d303-11e8-bc12-0800277a2afc
Spec:
  Min Member:  6
Events:        <none>

If this is actually a bug let me know and I can dive in and try to figure it out. :) Also if I'm missing something, let me know!

Thanks!

k82cn · 2018-10-19T02:08:48Z

The tutorial is using release-0.2 branch :) I used to handled this at #443 , please free feel to check if there's any bugs :)

jeefy · 2018-10-19T17:38:55Z

My bad! You're right.

2379591517:kube-batch jeefy$ kubectl describe podgroups.scheduling.incubator.k8s.io qj-1
Name:         qj-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.incubator.k8s.io/v1alpha1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2018-10-19T17:24:44Z
  Generation:          1
  Resource Version:    817
  Self Link:           /apis/scheduling.incubator.k8s.io/v1alpha1/namespaces/default/podgroups/qj-1
  UID:                 dec90dd3-d3c3-11e8-912c-080027dc9f66
Spec:
  Min Member:  6
Events:
  Type     Reason         Age   From        Message
  ----     ------         ----  ----        -------
  Warning  Unschedulable  6m    kube-batch  not enough resource for job

I wasn't going against master but v0.2.

k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 17, 2018

jeefy closed this as completed Oct 17, 2018

k8s-ci-robot reopened this Oct 17, 2018

jeefy closed this as completed Oct 19, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

No meaningful status if gang-scheduled job has insufficient resources #447

No meaningful status if gang-scheduled job has insufficient resources #447

jeefy commented Oct 17, 2018

jeefy commented Oct 17, 2018

k8s-ci-robot commented Oct 17, 2018

k82cn commented Oct 18, 2018 •

edited

Loading

jeefy commented Oct 18, 2018

k82cn commented Oct 19, 2018

jeefy commented Oct 19, 2018

No meaningful status if gang-scheduled job has insufficient resources #447

No meaningful status if gang-scheduled job has insufficient resources #447

Comments

jeefy commented Oct 17, 2018

jeefy commented Oct 17, 2018

k8s-ci-robot commented Oct 17, 2018

k82cn commented Oct 18, 2018 • edited Loading

jeefy commented Oct 18, 2018

k82cn commented Oct 19, 2018

jeefy commented Oct 19, 2018

k82cn commented Oct 18, 2018 •

edited

Loading