Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Failing Test] gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu) #91990

Closed
hasheddan opened this issue Jun 10, 2020 · 6 comments · Fixed by #92031
Closed
Assignees
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@hasheddan
Copy link
Contributor

Which jobs are failing:

gce-device-plugin-gpu-master (ci-kubernetes-e2e-gce-device-plugin-gpu)

Which test(s) are failing:

Up

Since when has it been failing:

06-09-20 16:31 PDT

Testgrid link:

https://testgrid.k8s.io/sig-release-master-blocking#gce-device-plugin-gpu-master

Reason for failure:

W0610 14:13:28.065] 2020/06/10 14:13:28 process.go:155: Step './hack/e2e-internal/e2e-down.sh' finished in 6m37.239818042s
W0610 14:13:28.065] 2020/06/10 14:13:28 process.go:96: Saved XML output to /workspace/_artifacts/junit_runner.xml.
W0610 14:13:28.065] 2020/06/10 14:13:28 main.go:312: Something went wrong: starting e2e cluster: error during ./hack/e2e-internal/e2e-up.sh: exit status 1
W0610 14:13:28.066] Traceback (most recent call last):
W0610 14:13:28.067]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 720, in <module>
W0610 14:13:28.067]     main(parse_args())
W0610 14:13:28.067]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 570, in main
W0610 14:13:28.068]     mode.start(runner_args)
W0610 14:13:28.068]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 228, in start
W0610 14:13:28.069]     check_env(env, self.command, *args)
W0610 14:13:28.069]   File "/workspace/./test-infra/jenkins/../scenarios/kubernetes_e2e.py", line 111, in check_env
W0610 14:13:28.069]     subprocess.check_call(cmd, env=env)
W0610 14:13:28.069]   File "/usr/lib/python2.7/subprocess.py", line 190, in check_call
W0610 14:13:28.070]     raise CalledProcessError(retcode, cmd)
W0610 14:13:28.070] subprocess.CalledProcessError: Command '('kubetest', '--dump=/workspace/_artifacts', '--gcp-service-account=/etc/service-account/service-account.json', '--up', '--down', '--test', '--provider=gce', '--cluster=bootstrap-e2e', '--gcp-network=bootstrap-e2e', '--check-leaked-resources', '--extract=ci/latest', '--gcp-node-image=gci', '--gcp-project-type=gpu-project', '--gcp-zone=us-west1-b', '--test_args=--ginkgo.focus=\\[Feature:GPUDevicePlugin\\] --minStartupPods=8', '--timeout=180m')' returned non-zero exit status 1

Anything else we need to know:

/sig testing
/cc @kubernetes/ci-signal
/priority critical-urgent
/milestone v1.19

@hasheddan hasheddan added the kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. label Jun 10, 2020
@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Jun 10, 2020
@k8s-ci-robot k8s-ci-robot added this to the v1.19 milestone Jun 10, 2020
@hasheddan
Copy link
Contributor Author

/assign @hkamel

@hkamel
Copy link

hkamel commented Jun 11, 2020

@dims I see the PR merged, however, the job still failing in the recent run 22:31 PDT

Failures for bootstrap-e2e-minion-group (if any):
W0611 06:08:56.190] 2020/06/11 06:08:56 process.go:155: Step './cluster/log-dump/log-dump.sh /workspace/_artifacts' finished in 2m14.442417098s

W0611 06:08:44.597] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:46.070] scp: /var/log/fluentd.log*: No such file or directory W0611 06:08:46.070] scp: /var/log/node-problem-detector.log*: No such file or directory W0611 06:08:46.072] scp: /var/log/kubelet.cov*: No such file or directory W0611 06:08:46.072] scp: /var/log/startupscript.log*: No such file or directory W0611 06:08:46.073] scp: /var/log/fluentd.log*: No such file or directory W0611 06:08:46.074] scp: /var/log/node-problem-detector.log*: No such file or directory W0611 06:08:46.075] scp: /var/log/kubelet.cov*: No such file or directory W0611 06:08:46.075] scp: /var/log/startupscript.log*: No such file or directory W0611 06:08:46.079] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:46.084] ERROR: (gcloud.compute.scp) [/usr/bin/scp] exited with return code [1]. W0611 06:08:51.917] INSTANCE_GROUPS=bootstrap-e2e-minion-group W0611 06:08:51.918] NODE_NAMES=bootstrap-e2e-minion-group-dq3m bootstrap-e2e-minion-group-gjk0 bootstrap-e2e-minion-group-qpxw I0611 06:08:54.017] Failures for bootstrap-e2e-minion-group (if any): W0611 06:08:56.190] 2020/06/11 06:08:56 process.go:155: Step './cluster/log-dump/log-dump.sh /workspace/_artifacts' finished in 2m14.442417098s W0611 06:08:56.191] 2020/06/11 06:08:56 process.go:153: Running: ./hack/e2e-internal/e2e-down.sh

@hkamel
Copy link

hkamel commented Jun 12, 2020

@dims I see the issue closed by the PR however the job still failing since 06-10 16:31 PDT

@hasheddan
Copy link
Contributor Author

@hkamel looks like we just turned green :)

@dims
Copy link
Member

dims commented Jun 14, 2020

cool! :)

@hkamel
Copy link

hkamel commented Jun 14, 2020

@hasheddan Yes it's indeed ... @dims thanks for the support!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/failing-test Categorizes issue or PR as related to a consistently or frequently failing test. priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants