======== Autoscaler status: 2023-07-04 22:43:56.087529 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group Pending: (no pending nodes) Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 0.0/1.0 accelerator_type:A100 0.00/16.000 GiB memory 0.00/4.601 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:43:56,088 INFO autoscaler.py:462 -- The autoscaler took 0.055 seconds to complete the update iteration. 2023-07-04 22:44:01,160 INFO node_provider.py:257 -- Fetched pod data at resource version 210039856. 2023-07-04 22:44:01,161 INFO autoscaler.py:143 -- The autoscaler took 0.043 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:01,161 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:01.161546 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group Pending: (no pending nodes) Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 0.0/1.0 accelerator_type:A100 0.00/16.000 GiB memory 0.00/4.601 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 24+ pending tasks/actors 2023-07-04 22:44:01,163 INFO autoscaler.py:1366 -- StandardAutoscaler: Queue 24 new nodes for launch 2023-07-04 22:44:01,164 INFO node_launcher.py:166 -- BaseNodeLauncher: Got 24 nodes to launch. 2023-07-04 22:44:01,164 INFO node_launcher.py:166 -- BaseNodeLauncher: Launching 24 nodes, type workergroup. 2023-07-04 22:44:01,164 INFO node_provider.py:286 -- Autoscaler is submitting the following patch to RayCluster coder-senorchang-ray-1688535753 in namespace anon-coder-prod. 2023-07-04 22:44:01,164 INFO node_provider.py:290 -- [{'op': 'replace', 'path': '/spec/workerGroupSpecs/0/replicas', 'value': 24}] 2023-07-04 22:44:01,193 INFO autoscaler.py:462 -- The autoscaler took 0.075 seconds to complete the update iteration. 2023-07-04 22:44:01,193 INFO monitor.py:428 -- :event_summary:Adding 24 node(s) of type workergroup. 2023-07-04 22:44:06,283 INFO node_provider.py:257 -- Fetched pod data at resource version 210040046. 2023-07-04 22:44:06,283 INFO autoscaler.py:143 -- The autoscaler took 0.056 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:06,284 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:06.283993 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 0.0/1.0 accelerator_type:A100 0.00/16.000 GiB memory 0.00/4.601 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 24+ pending tasks/actors 2023-07-04 22:44:06,285 INFO autoscaler.py:462 -- The autoscaler took 0.058 seconds to complete the update iteration. 2023-07-04 22:44:11,477 INFO node_provider.py:257 -- Fetched pod data at resource version 210040157. 2023-07-04 22:44:11,478 INFO autoscaler.py:143 -- The autoscaler took 0.161 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:11,478 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:11.478423 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending 172.16.94.147: workergroup, waiting IP not yet assigned: workergroup, waiting 172.16.65.114: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting 172.16.63.142: workergroup, waiting 172.16.63.189: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 0.0/1.0 accelerator_type:A100 0.00/16.000 GiB memory 0.00/4.601 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 24+ pending tasks/actors 2023-07-04 22:44:11,479 INFO autoscaler.py:462 -- The autoscaler took 0.163 seconds to complete the update iteration. 2023-07-04 22:44:16,574 INFO node_provider.py:257 -- Fetched pod data at resource version 210040283. 2023-07-04 22:44:16,575 INFO autoscaler.py:143 -- The autoscaler took 0.063 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:16,575 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:16.575558 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 2 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 64.0/64.0 CPU 2.0/2.0 GPU 0.0/3.0 accelerator_type:A100 0.00/516.000 GiB memory 0.00/154.551 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 24+ pending tasks/actors 2023-07-04 22:44:16,577 INFO autoscaler.py:1366 -- StandardAutoscaler: Queue 2 new nodes for launch 2023-07-04 22:44:16,577 INFO node_launcher.py:166 -- BaseNodeLauncher: Got 2 nodes to launch. 2023-07-04 22:44:16,577 INFO node_launcher.py:166 -- BaseNodeLauncher: Launching 2 nodes, type workergroup. 2023-07-04 22:44:16,577 INFO node_provider.py:286 -- Autoscaler is submitting the following patch to RayCluster coder-senorchang-ray-1688535753 in namespace anon-coder-prod. 2023-07-04 22:44:16,577 INFO node_provider.py:290 -- [{'op': 'replace', 'path': '/spec/workerGroupSpecs/0/replicas', 'value': 26}] 2023-07-04 22:44:16,608 INFO autoscaler.py:462 -- The autoscaler took 0.097 seconds to complete the update iteration. 2023-07-04 22:44:16,609 INFO monitor.py:428 -- :event_summary:Resized to 64 CPUs, 2 GPUs. 2023-07-04 22:44:16,609 INFO monitor.py:428 -- :event_summary:Adding 2 node(s) of type workergroup. 2023-07-04 22:44:21,724 INFO node_provider.py:257 -- Fetched pod data at resource version 210040402. 2023-07-04 22:44:21,724 INFO autoscaler.py:143 -- The autoscaler took 0.065 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:21,725 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:21.725432 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 5 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting 172.16.65.116: workergroup, waiting IP not yet assigned: workergroup, waiting 172.16.63.155: workergroup, waiting IP not yet assigned: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 128.0/160.0 CPU 4.0/5.0 GPU 0.0/6.0 accelerator_type:A100 0.00/1266.000 GiB memory 0.00/379.476 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 20+ pending tasks/actors 2023-07-04 22:44:21,727 INFO autoscaler.py:462 -- The autoscaler took 0.067 seconds to complete the update iteration. 2023-07-04 22:44:21,727 INFO monitor.py:428 -- :event_summary:Resized to 160 CPUs, 5 GPUs. 2023-07-04 22:44:26,985 INFO node_provider.py:257 -- Fetched pod data at resource version 210040515. 2023-07-04 22:44:26,986 INFO autoscaler.py:143 -- The autoscaler took 0.227 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:26,986 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:26.986679 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 8 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting IP not yet assigned: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 256.0/256.0 CPU 8.0/8.0 GPU 0.0/9.0 accelerator_type:A100 0.00/2016.000 GiB memory 0.00/604.401 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 2+ pending tasks/actors 2023-07-04 22:44:26,988 INFO autoscaler.py:462 -- The autoscaler took 0.23 seconds to complete the update iteration. 2023-07-04 22:44:26,989 INFO monitor.py:428 -- :event_summary:Resized to 256 CPUs, 8 GPUs. 2023-07-04 22:44:32,164 INFO node_provider.py:257 -- Fetched pod data at resource version 210040697. 2023-07-04 22:44:32,164 INFO autoscaler.py:143 -- The autoscaler took 0.137 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:32,165 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:32.165249 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 16 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending 172.16.65.125: workergroup, waiting 172.16.65.123: workergroup, waiting 172.16.65.120: workergroup, waiting 172.16.94.158: workergroup, waiting IP not yet assigned: workergroup, waiting Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 512.0/512.0 CPU 16.0/16.0 GPU 0.0/17.0 accelerator_type:A100 0.00/4016.000 GiB memory 0.00/1204.201 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 16+ pending tasks/actors 2023-07-04 22:44:32,167 INFO autoscaler.py:1366 -- StandardAutoscaler: Queue 6 new nodes for launch 2023-07-04 22:44:32,167 INFO node_launcher.py:166 -- BaseNodeLauncher: Got 6 nodes to launch. 2023-07-04 22:44:32,167 INFO node_launcher.py:166 -- BaseNodeLauncher: Launching 6 nodes, type workergroup. 2023-07-04 22:44:32,168 INFO node_provider.py:286 -- Autoscaler is submitting the following patch to RayCluster coder-senorchang-ray-1688535753 in namespace anon-coder-prod. 2023-07-04 22:44:32,168 INFO node_provider.py:290 -- [{'op': 'replace', 'path': '/spec/workerGroupSpecs/0/replicas', 'value': 32}] 2023-07-04 22:44:32,203 INFO autoscaler.py:462 -- The autoscaler took 0.176 seconds to complete the update iteration. 2023-07-04 22:44:32,204 INFO monitor.py:428 -- :event_summary:Resized to 512 CPUs, 16 GPUs. 2023-07-04 22:44:32,204 INFO monitor.py:428 -- :event_summary:Adding 6 node(s) of type workergroup. 2023-07-04 22:44:37,303 INFO node_provider.py:257 -- Fetched pod data at resource version 210040797. 2023-07-04 22:44:37,304 INFO autoscaler.py:143 -- The autoscaler took 0.067 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:37,305 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:37.305219 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 0.00/1804.001 GiB object_store_memory Demands: {'CPU': 32.0, 'GPU': 1.0}: 1+ pending tasks/actors 2023-07-04 22:44:37,308 INFO autoscaler.py:462 -- The autoscaler took 0.071 seconds to complete the update iteration. 2023-07-04 22:44:37,308 INFO monitor.py:428 -- :event_summary:Resized to 768 CPUs, 24 GPUs. 2023-07-04 22:44:42,460 INFO node_provider.py:257 -- Fetched pod data at resource version 210040858. 2023-07-04 22:44:42,460 INFO autoscaler.py:143 -- The autoscaler took 0.069 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:42,461 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:42.461431 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 0.00/1804.001 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:44:42,463 INFO autoscaler.py:462 -- The autoscaler took 0.072 seconds to complete the update iteration. 2023-07-04 22:44:47,560 INFO node_provider.py:257 -- Fetched pod data at resource version 210040914. 2023-07-04 22:44:47,561 INFO autoscaler.py:143 -- The autoscaler took 0.064 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:47,562 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:47.562071 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 0.00/1804.001 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:44:47,564 INFO autoscaler.py:462 -- The autoscaler took 0.067 seconds to complete the update iteration. 2023-07-04 22:44:52,665 INFO node_provider.py:257 -- Fetched pod data at resource version 210040975. 2023-07-04 22:44:52,666 INFO autoscaler.py:143 -- The autoscaler took 0.066 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:52,666 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:52.666506 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 0.33/1804.001 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:44:52,668 INFO autoscaler.py:462 -- The autoscaler took 0.069 seconds to complete the update iteration. 2023-07-04 22:44:57,763 INFO node_provider.py:257 -- Fetched pod data at resource version 210041031. 2023-07-04 22:44:57,763 INFO autoscaler.py:143 -- The autoscaler took 0.058 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:44:57,764 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:44:57.764457 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 0.33/1804.001 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:44:57,766 INFO autoscaler.py:462 -- The autoscaler took 0.061 seconds to complete the update iteration. 2023-07-04 22:45:02,858 INFO node_provider.py:257 -- Fetched pod data at resource version 210041093. 2023-07-04 22:45:02,859 INFO autoscaler.py:143 -- The autoscaler took 0.059 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:45:02,860 INFO autoscaler.py:419 -- ======== Autoscaler status: 2023-07-04 22:45:02.860432 ======== Node status --------------------------------------------------------------- Healthy: 1 head-group 24 workergroup Pending: IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending IP not yet assigned: workergroup, pending Recent failures: (no failures) Resources --------------------------------------------------------------- Usage: 768.0/768.0 CPU 24.0/24.0 GPU 0.0/25.0 accelerator_type:A100 0.00/6016.000 GiB memory 1.38/1804.001 GiB object_store_memory Demands: (no resource demands) 2023-07-04 22:45:02,862 INFO autoscaler.py:462 -- The autoscaler took 0.063 seconds to complete the update iteration. 2023-07-04 22:45:07,960 INFO node_provider.py:257 -- Fetched pod data at resource version 210041147. 2023-07-04 22:45:07,961 INFO autoscaler.py:143 -- The autoscaler took 0.063 seconds to fetch the list of non-terminated nodes. 2023-07-04 22:45:07,962 INFO autoscaler.py:419 --