Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Google Compute Engine does not have enough resources available to fulfill request: us-central1-a" #673

Closed
Ark-kun opened this issue Jan 12, 2019 · 12 comments

Comments

@Ark-kun
Copy link
Contributor

Ark-kun commented Jan 12, 2019

Creating cluster buildimage-ae7cefede4-9170 in us-central1-a...
....................done.
ERROR: (gcloud.container.clusters.create) Operation [<Operation
 clusterConditions: [<StatusCondition
 code: CodeValueValuesEnum(GCE_STOCKOUT, 1)
 message: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-a.'>]
 detail: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-a.'
 endTime: u'2019-01-12T05:18:14.71073947Z'
 name: u'operation-1547270279054-a9b609ab'
 nodepoolConditions: []
 operationType: OperationTypeValueValuesEnum(CREATE_CLUSTER, 1)
 selfLink: u'https://container.googleapis.com/v1/projects/363997316495/zones/us-central1-a/operations/operation-1547270279054-a9b609ab'
 startTime: u'2019-01-12T05:17:59.054245385Z'
 status: StatusValueValuesEnum(DONE, 3)
 statusMessage: u'Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-a.'
 targetLink: u'https://container.googleapis.com/v1/projects/363997316495/zones/us-central1-a/clusters/buildimage-ae7cefede4-9170'
 zone: u'us-central1-a'>] finished with error: Try a different location, or try again later: Google Compute Engine does not have enough resources available to fulfill request: us-central1-a.
@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 12, 2019

I see many xgb-training GCE instances. GKE has no clusters.

  Name Zone Recommendation Internal IP External IP Connect  
  xgb-xgboost-trainer-62j22-m us-central1-c Save $52 / mo 10.128.0.4 (nic0) 35.239.48.44 SSH  
  xgb-xgboost-trainer-62j22-w-0 us-central1-c Save $71 / mo 10.128.0.2 (nic0) 35.202.242.38 SSH  
  xgb-xgboost-trainer-62j22-w-1 us-central1-c Save $73 / mo 10.128.0.3 (nic0) 35.194.14.241 SSH  
  xgb-xgboost-trainer-6hc9f-m us-central1-a Save $57 / mo 10.128.0.35 (nic0) 35.193.152.235 SSH  
  xgb-xgboost-trainer-6hc9f-w-0 us-central1-a Save $74 / mo 10.128.0.34 (nic0) 35.224.81.143 SSH  
  xgb-xgboost-trainer-6hc9f-w-1 us-central1-a Save $73 / mo 10.128.0.55 (nic0) 35.224.165.11 SSH  
  xgb-xgboost-trainer-6pkzr-m us-central1-b Save $52 / mo 10.128.0.15 (nic0) 35.202.225.31 SSH  
  xgb-xgboost-trainer-6pkzr-w-0 us-central1-b Save $73 / mo 10.128.0.14 (nic0) 35.239.133.179 SSH  
  xgb-xgboost-trainer-6pkzr-w-1 us-central1-b Save $71 / mo 10.128.0.16 (nic0) 104.154.77.109 SSH  
  xgb-xgboost-trainer-74gc2-m us-central1-a Save $55 / mo 10.128.0.17 (nic0) 35.224.57.178 SSH  
  xgb-xgboost-trainer-74gc2-w-0 us-central1-a Save $74 / mo 10.128.0.18 (nic0) 35.224.116.148 SSH  
  xgb-xgboost-trainer-74gc2-w-1 us-central1-a Save $73 / mo 10.128.0.19 (nic0) 35.224.23.49 SSH  
  xgb-xgboost-trainer-88cn6-m us-central1-a Save $55 / mo 10.128.0.31 (nic0) 146.148.33.41 SSH  
  xgb-xgboost-trainer-88cn6-w-0 us-central1-a Save $73 / mo 10.128.0.29 (nic0) 35.232.90.180 SSH  
  xgb-xgboost-trainer-88cn6-w-1 us-central1-a Save $74 / mo 10.128.0.30 (nic0) 35.192.38.109 SSH  
  xgb-xgboost-trainer-8bbcn-m us-central1-b Save $52 / mo 10.128.0.26 (nic0) 104.197.87.29 SSH  
  xgb-xgboost-trainer-8bbcn-w-0 us-central1-b Save $73 / mo 10.128.0.28 (nic0) 35.239.193.6 SSH  
  xgb-xgboost-trainer-8bbcn-w-1 us-central1-b Save $73 / mo 10.128.0.27 (nic0) 35.193.214.166 SSH  
  xgb-xgboost-trainer-cx597-m us-central1-f Save $55 / mo 10.128.0.23 (nic0) 104.154.120.107 SSH  
  xgb-xgboost-trainer-cx597-w-0 us-central1-f Save $72 / mo 10.128.0.25 (nic0) 35.193.200.207 SSH  
  xgb-xgboost-trainer-cx597-w-1 us-central1-f Save $71 / mo 10.128.0.24 (nic0) 23.236.56.125 SSH  
  xgb-xgboost-trainer-h9hqd-m us-central1-c Save $52 / mo 10.128.0.41 (nic0) 35.188.34.216 SSH  
  xgb-xgboost-trainer-h9hqd-w-0 us-central1-c Save $73 / mo 10.128.0.39 (nic0) 35.194.46.29 SSH  
  xgb-xgboost-trainer-h9hqd-w-1 us-central1-c Save $72 / mo 10.128.0.40 (nic0) 35.232.118.0 SSH  
  xgb-xgboost-trainer-qq5cw-m us-central1-b Save $53 / mo 10.128.0.13 (nic0) 35.239.163.53 SSH  
  xgb-xgboost-trainer-qq5cw-w-0 us-central1-b Save $72 / mo 10.128.0.11 (nic0) 104.198.202.148 SSH  
  xgb-xgboost-trainer-qq5cw-w-1 us-central1-b Save $72 / mo 10.128.0.12 (nic0) 35.226.139.14 SSH  
  xgb-xgboost-trainer-snr9d-m us-central1-b Save $54 / mo 10.128.0.32 (nic0) 104.154.137.29 SSH  
  xgb-xgboost-trainer-snr9d-w-0 us-central1-b Save $74 / mo 10.128.0.36 (nic0) 35.184.190.160 SSH  
  xgb-xgboost-trainer-snr9d-w-1 us-central1-b Save $73 / mo 10.128.0.33 (nic0) 35.239.225.249 SSH  
  xgb-xgboost-trainer-sp7vt-m us-central1-c Save $57 / mo 10.128.0.62 (nic0) 35.232.172.141 SSH  
  xgb-xgboost-trainer-sp7vt-w-0 us-central1-c Save $73 / mo 10.128.0.59 (nic0) 35.188.46.86 SSH  
  xgb-xgboost-trainer-sp7vt-w-1 us-central1-c Save $74 / mo 10.128.0.63 (nic0) 35.238.191.254 SSH  
  xgb-xgboost-trainer-tghz9-m us-central1-f Save $64 / mo 10.128.0.22 (nic0) 35.192.228.1 SSH  
  xgb-xgboost-trainer-tghz9-w-0 us-central1-f Save $74 / mo 10.128.0.20 (nic0) 35.202.50.177 SSH  
  xgb-xgboost-trainer-tghz9-w-1 us-central1-f Save $74 / mo 10.128.0.21 (nic0) 35.238.26.251 SSH  
  xgb-xgboost-trainer-vjskh-m us-central1-c Save $59 / mo 10.128.0.6 (nic0) 104.154.242.225 SSH  
  xgb-xgboost-trainer-vjskh-w-0 us-central1-c Save $73 / mo 10.128.0.5 (nic0) 35.239.144.253 SSH  
  xgb-xgboost-trainer-vjskh-w-1 us-central1-c Save $73 / mo 10.128.0.7 (nic0) 104.198.23.74 SSH  
  xgb-xgboost-trainer-wmds7-m us-central1-a Save $58 / mo 10.128.0.52 (nic0) 35.226.180.190 SSH  
  xgb-xgboost-trainer-wmds7-w-0 us-central1-a Save $73 / mo 10.128.0.43 (nic0) 35.224.61.49 SSH  
  xgb-xgboost-trainer-wmds7-w-1 us-central1-a Save $73 / mo 10.128.0.53 (nic0) 35.232.123.179 SSH  
  xgb-xgboost-trainer-wtdvq-m us-central1-f Save $60 / mo 10.128.0.10 (nic0) 35.224.127.189 SSH  
  xgb-xgboost-trainer-wtdvq-w-0 us-central1-f Save $72 / mo 10.128.0.8 (nic0) 35.238.233.113 SSH  
  xgb-xgboost-trainer-wtdvq-w-1 us-central1-f Save $71 / mo 10.128.0.9 (nic0) 35.225.60.253 SSH  

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 12, 2019

Looks like some of them were started very long time ago like https://pantheon.corp.google.com/compute/instancesDetail/zones/us-central1-b/instances/xgb-xgboost-trainer-6pkzr-m?project=ml-pipeline-test that's testing the following commit from November 6 6a4cd86

Custom metadata
dataproc-bucket	
dataproc-788e0848-9dd5-4ea6-9b78-eadcb7f5a23d-us-central1
dataproc-cloud-logging-enabled	
false
dataproc-cluster-configuration-directory	
gs://dataproc-788e0848-9dd5-4ea6-9b78-eadcb7f5a23d-us-central1/google-cloud-dataproc-metainfo/3e274ad6-975e-4e13-b839-9ff9324ab43f/
dataproc-cluster-name	
xgb-xgboost-trainer-6pkzr
dataproc-cluster-uuid	
3e274ad6-975e-4e13-b839-9ff9324ab43f
dataproc-initialization-script-0	
gs://ml-pipeline-test/6a4cd866387a90fb7528bd7f6e52974d0d4b186c/sample_test/xgb_181107_053604/initialization_actions.sh
dataproc-initialization-script-count	
1
dataproc-initialization-script-timeout-sec-0	
600
dataproc-master	
xgb-xgboost-trainer-6pkzr-m
dataproc-master-additional	
 More
dataproc-region	
us-central1
dataproc-worker-count	
2
dataproc-agent-output-directory	
gs://dataproc-788e0848-9dd5-4ea6-9b78-eadcb7f5a23d-us-central1/google-cloud-dataproc-metainfo/3e274ad6-975e-4e13-b839-9ff9324ab43f/xgb-xgboost-trainer-6pkzr-m
dataproc-datanode-enabled	
true
dataproc-option-run-init-actions-early	
false
dataproc-protocol-spec	
ElgKKmRhdGFwcm9jY29udHJvbC11cy1jZW50cmFsMS5nb29nbGVhcGlzLmNvbRIqZGF0YXByb2Njb250cm9sLXVzLWNlbnRyYWwxLmdvb2dsZWFwaXMuY29t
dataproc-role	
Master

@Ark-kun
Copy link
Contributor Author

Ark-kun commented Jan 12, 2019

I've deleted all VMs except the xgb-xgboost-trainer-6pkzr-m and its siblings.

@IronPan
Copy link
Member

IronPan commented Jan 12, 2019

It appears to be the resource leaking from sample tests. @gaoning777 could you take a look?

@IronPan
Copy link
Member

IronPan commented Jan 12, 2019

The tests seems unblocked after VMs are cleaned up. thx @Ark-kun

@gaoning777
Copy link
Contributor

The prow system restart the tests whenever there are new commits. The presubmit test script does not reclaim the cluster since the process(that runs the script) is shut down.

@gaoning777
Copy link
Contributor

One possible approach to avoid the cluster leaking is to catch the signal(sigint/sigterm) and reclaim the cluster.

@vicaire
Copy link
Contributor

vicaire commented Mar 26, 2019

Is this resolved? Can we close?

@IronPan
Copy link
Member

IronPan commented Mar 27, 2019

/close

@k8s-ci-robot
Copy link
Contributor

@IronPan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

1 similar comment
@k8s-ci-robot
Copy link
Contributor

@IronPan: Closing this issue.

In response to this:

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@limbuu
Copy link

limbuu commented Feb 20, 2020

We are having the same issue while adding new pool to the existing cluster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants