Skip to content
This repository has been archived by the owner on May 25, 2023. It is now read-only.

No meaningful status if gang-scheduled job has insufficient resources #447

Closed
jeefy opened this issue Oct 17, 2018 · 6 comments
Closed

No meaningful status if gang-scheduled job has insufficient resources #447

jeefy opened this issue Oct 17, 2018 · 6 comments
Labels
kind/feature Categorizes issue or PR as related to a new feature.

Comments

@jeefy
Copy link
Contributor

jeefy commented Oct 17, 2018

Is this a BUG REPORT or FEATURE REQUEST?:
/kind feature

What happened:
When submitting a gang-scheduled job, if there are no resources available to schedule it there is no meaningful feedback to a user.

The podGroup has no events that they can't start.
The job says that it's running, with no errors or messages.
The pods themselves are created but not started.

The only place to find out that there's a scheduling error is by looking at the kube-batchd logs.

What you expected to happen:
A message/event either within the job or the podGroup stating that there are insufficient resources to schedule this job.

How to reproduce it (as minimally and precisely as possible):
Run the example job with a standard minikube start

Anything else we need to know?:
I'm probably going to be diving into this a lot more so Hi! :)

Environment:
master branch

minikube v0.30.0 on OSX

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl version
Client Version: version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.2", GitCommit:"bb9ffb1654d4a729bb4cec18ff088eacc153c239", GitTreeState:"clean", BuildDate:"2018-08-07T23:17:28Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:36:14Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
@k8s-ci-robot k8s-ci-robot added the kind/feature Categorizes issue or PR as related to a new feature. label Oct 17, 2018
@jeefy jeefy closed this as completed Oct 17, 2018
@jeefy
Copy link
Contributor Author

jeefy commented Oct 17, 2018

I hit the wrong button derp
/reopen

@k8s-ci-robot
Copy link
Contributor

@jeefy: Reopening this issue.

In response to this:

I hit the wrong button derp
/reopen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot reopened this Oct 17, 2018
@k82cn
Copy link
Contributor

k82cn commented Oct 18, 2018

xref #401

When submitting a gang-scheduled job, if there are no resources available to schedule it there is no meaningful feedback to a user.

Which version are you testing? In master, we record an "Unschedulable" event if not enough resources for the job in each scheduling cycle.

@jeefy
Copy link
Contributor Author

jeefy commented Oct 18, 2018

This is off of master. (Sorry this is long)

I'm following the directions in the Tutorial (more or less)

Dump of commands/output below

Set up environment

minikube start --kubernetes-version v1.12.1 --cpus 4 --memory 8192  && \
kubectl create serviceaccount tiller --namespace kube-system && \
kubectl create -f tiller-crb.yaml && \
helm init --service-account tiller --upgrade && \
sleep 30 && \
helm install $GOPATH/src/github.com/kubernetes-sigs/kube-batch/deployment/kube-batch --namespace kube-system

(Helm) output

Jeffreys-MacBook-Pro:jeefy jeefy$ helm status filled-umbrellabird
LAST DEPLOYED: Thu Oct 18 14:29:07 2018
NAMESPACE: kube-system
STATUS: DEPLOYED

RESOURCES:
==> v1beta1/CustomResourceDefinition
NAME                                   AGE
podgroups.scheduling.incubator.k8s.io  39s
queues.scheduling.incubator.k8s.io     35s

==> v1/Deployment
NAME        DESIRED  CURRENT  UP-TO-DATE  AVAILABLE  AGE
kube-batch  1        1        1           1          35s

==> v1/Pod(related)
NAME                         READY  STATUS   RESTARTS  AGE
kube-batch-659d977bb7-zxm55  1/1    Running  0         35s


NOTES:
The batch scheduler of Kubernetes.

Creating a job using the example job-01.yaml, however requiring "2000m" CPU

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl create -f job-01.yaml
job.batch/qj-1 created
podgroup.scheduling.incubator.k8s.io/qj-1 created

Describing the job and the podGroup

Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl describe job qj-1
Name:           qj-1
Namespace:      default
Selector:       controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
Labels:         controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
                job-name=qj-1
Annotations:    <none>
Parallelism:    6
Completions:    6
Start Time:     Thu, 18 Oct 2018 14:30:31 -0400
Pods Statuses:  6 Running / 0 Succeeded / 0 Failed
Pod Template:
  Labels:       controller-uid=e507213b-d303-11e8-bc12-0800277a2afc
                job-name=qj-1
  Annotations:  scheduling.k8s.io/group-name=qj-1
  Containers:
   busybox:
    Image:      busybox
    Port:       <none>
    Host Port:  <none>
    Requests:
      cpu:        2
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type    Reason            Age   From            Message
  ----    ------            ----  ----            -------
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-sxnf4
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-p6qrv
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-gwzvz
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-slwbq
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-jqxln
  Normal  SuccessfulCreate  15s   job-controller  Created pod: qj-1-vzltr
Jeffreys-MacBook-Pro:jeefy jeefy$ kubectl describe podgroups.scheduling.incubator.k8s.io qj-1
Name:         qj-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.incubator.k8s.io/v1alpha1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2018-10-18T18:30:31Z
  Generation:          1
  Resource Version:    613
  Self Link:           /apis/scheduling.incubator.k8s.io/v1alpha1/namespaces/default/podgroups/qj-1
  UID:                 e508c7b2-d303-11e8-bc12-0800277a2afc
Spec:
  Min Member:  6
Events:        <none>

If this is actually a bug let me know and I can dive in and try to figure it out. :) Also if I'm missing something, let me know!

Thanks!

@k82cn
Copy link
Contributor

k82cn commented Oct 19, 2018

The tutorial is using release-0.2 branch :) I used to handled this at #443 , please free feel to check if there's any bugs :)

@jeefy
Copy link
Contributor Author

jeefy commented Oct 19, 2018

My bad! You're right.

2379591517:kube-batch jeefy$ kubectl describe podgroups.scheduling.incubator.k8s.io qj-1
Name:         qj-1
Namespace:    default
Labels:       <none>
Annotations:  <none>
API Version:  scheduling.incubator.k8s.io/v1alpha1
Kind:         PodGroup
Metadata:
  Creation Timestamp:  2018-10-19T17:24:44Z
  Generation:          1
  Resource Version:    817
  Self Link:           /apis/scheduling.incubator.k8s.io/v1alpha1/namespaces/default/podgroups/qj-1
  UID:                 dec90dd3-d3c3-11e8-912c-080027dc9f66
Spec:
  Min Member:  6
Events:
  Type     Reason         Age   From        Message
  ----     ------         ----  ----        -------
  Warning  Unschedulable  6m    kube-batch  not enough resource for job

I wasn't going against master but v0.2.

@jeefy jeefy closed this as completed Oct 19, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/feature Categorizes issue or PR as related to a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants