[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

hasheddan · 2020-06-10T14:17:44Z

Which jobs are failing:

gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu)

Which test(s) are failing:

Up

Since when has it been failing:

06-09-20 16:31 PDT

Testgrid link:

https://testgrid.k8s.io/sig-release-master-blocking#gce-device-plugin-gpu-master

Reason for failure:

W0610 14:13:28.065] 2020/06/10 14:13:28 process.go:155: Step './hack/e2e-internal/e2e-down.sh' finished in 6m37.239818042s
W0610 14:13:28.065] 2020/06/10 14:13:28 process.go:96: Saved XML output to /workspace/_artifacts/junit_runner.xml.
W0610 14:13:28.065] 2020/06/10 14:13:28 main.go:312: Something went wrong: starting e2e cluster: error during ./hack/e2e-internal/e2e-up.sh: exit status 1
W0610 14:13:28.066] Traceback (most recent call last):
W0610 14:13:28.067]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 720, in <module>
W0610 14:13:28.067]     main(parse_args())
W0610 14:13:28.067]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 570, in main
W0610 14:13:28.068]     mode.start(runner_args)
W0610 14:13:28.068]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 228, in start
W0610 14:13:28.069]     check_env(env, self.command, *args)
W0610 14:13:28.069]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 111, in check_env
W0610 14:13:28.069]     subprocess.check_call(cmd, env=env)
W0610 14:13:28.069]   File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
W0610 14:13:28.070]     raise CalledProcessError(retcode, cmd)
W0610 14:13:28.070] subprocess.CalledProcessError: Command '('kubetest', '--dump=/workspace/_artifacts', '--gcp-service-account=/etc/service-account/service-account.json', '--up', '--down', '--test', '--provider=gce', '--cluster=bootstrap-e2e', '--gcp-network=bootstrap-e2e', '--check-leaked-resources', '--extract=ci/latest', '--gcp-node-image=gci', '--gcp-project-type=gpu-project', '--gcp-zone=us-west1-b', '--test_args=--ginkgo.focus=\\[Feature:GPUDevicePlugin\\] --minStartupPods=8', '--timeout=180m')' returned non-zero exit status 1

Anything else we need to know:

/sig testing
/cc @kubernetes/ci-signal
/priority critical-urgent
/milestone v1.19

The text was updated successfully, but these errors were encountered:

hasheddan · 2020-06-10T14:18:11Z

/assign @hkamel

hkamel · 2020-06-11T06:50:24Z

@dims I see the PR merged, however, the job still failing in the recent run 22:31 PDT

Failures for bootstrap-e2e-minion-group (if any):
W0611 06:08:56.190] 2020/06/11 06:08:56 process.go:155: Step './cluster/log-dump/log-dump.sh /workspace/_artifacts' finished in 2m14.442417098s

W0611 06:08:44.597] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:46.070] scp: /var/log/fluentd.log*: No such file or directory W0611 06:08:46.070] scp: /var/log/node-problem-detector.log*: No such file or directory W0611 06:08:46.072] scp: /var/log/kubelet.cov*: No such file or directory W0611 06:08:46.072] scp: /var/log/startupscript.log*: No such file or directory W0611 06:08:46.073] scp: /var/log/fluentd.log*: No such file or directory W0611 06:08:46.074] scp: /var/log/node-problem-detector.log*: No such file or directory W0611 06:08:46.075] scp: /var/log/kubelet.cov*: No such file or directory W0611 06:08:46.075] scp: /var/log/startupscript.log*: No such file or directory W0611 06:08:46.079] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:46.084] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:51.917] INSTANCE_GROUPS=bootstrap-e2e-minion-group W0611 06:08:51.918] NODE_NAMES=bootstrap-e2e-minion-group-dq3m bootstrap-e2e-minion-group-gjk0 bootstrap-e2e-minion-group-qpxw I0611 06:08:54.017] Failures for bootstrap-e2e-minion-group (if any): W0611 06:08:56.190] 2020/06/11 06:08:56 process.go:155: Step './cluster/log-dump/log-dump.sh /workspace/_artifacts' finished in 2m14.442417098s W0611 06:08:56.191] 2020/06/11 06:08:56 process.go:153: Running: ./hack/e2e-internal/e2e-down.sh

hkamel · 2020-06-12T11:32:24Z

@dims I see the issue closed by the PR however the job still failing since 06-10 16:31 PDT

hasheddan · 2020-06-12T13:09:32Z

@hkamel looks like we just turned green :)

dims · 2020-06-14T02:25:09Z

cool! :)

hkamel · 2020-06-14T06:16:00Z

@hasheddan Yes it's indeed ... @dims thanks for the support!

hasheddan added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jun 10, 2020

k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 10, 2020

k8s-ci-robot added this to the v1.19 milestone Jun 10, 2020

k8s-ci-robot assigned hkamel Jun 10, 2020

dims mentioned this issue Jun 10, 2020

ci-kubernetes-e2e-gce-device-plugin-gpu needs docker as default runtime kubernetes/test-infra#17905

Merged

dims mentioned this issue Jun 11, 2020

fix default CONTAINER_RUNTIME_ENDPOINT for docker #92031

Merged

k8s-ci-robot closed this as completed in #92031 Jun 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

hasheddan commented Jun 10, 2020

hasheddan commented Jun 10, 2020

hkamel commented Jun 11, 2020

hkamel commented Jun 12, 2020

hasheddan commented Jun 12, 2020

dims commented Jun 14, 2020

hkamel commented Jun 14, 2020

[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

Comments

hasheddan commented Jun 10, 2020

hasheddan commented Jun 10, 2020

hkamel commented Jun 11, 2020

hkamel commented Jun 12, 2020

hasheddan commented Jun 12, 2020

dims commented Jun 14, 2020

hkamel commented Jun 14, 2020