Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci] AzureDevops jobs failing: NoSpaceLeftError: No space left on devices. #6635

Closed
jameslamb opened this issue Sep 2, 2024 · 1 comment · Fixed by #6636
Closed

[ci] AzureDevops jobs failing: NoSpaceLeftError: No space left on devices. #6635

jameslamb opened this issue Sep 2, 2024 · 1 comment · Fixed by #6636

Comments

@jameslamb
Copy link
Collaborator

Description

See, for example, https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=16953&view=logs&j=c90eab90-013e-596e-d874-a7254853d76e

[Errno 28] No space left on device: '/home/AzDevOps_azpcontainer/miniforge/envs/test-env/.condatmp/a3c4c55f-6f03-4ae9-8a71-6027b1762a56'

NoSpaceLeftError: No space left on devices.

I also see many warnings about disk space on the jobs that run on our custom hosted Linux runner:

image

Reproducible example

This is happening on all CI jobs.

Environment info

N/A

Additional Comments

Related discussion about files being left behind: #6416

@jameslamb
Copy link
Collaborator Author

I put up #6416 to push some commits and get debugging information... that PR now contains a proposed fix.

I think the primary issue was that container images on the self-hosted runners (the pool introduced in #6407) were taking up too much space.

Via commits pushed to #6416, ran the following:

# check disk usage
df

# check docker's disk usage
docker system df

# list docker images
docker images

Saw that the main disk was 85% full (roughly 26 / 31 GB used).

---- df ----
Filesystem     1K-blocks     Used Available Use% Mounted on
overlay         32370556 26073772   4917876  85% /
tmpfs              65536        0     65536   0% /dev
tmpfs            8030004        0   8030004   0% /sys/fs/cgroup
shm                65536        0     65536   0% /dev/shm
/dev/sda3       32370556 26073772   4917876  85% /__t
tmpfs            3212004     8924   3203080   1% /run/docker.sock
tmpfs            8030004        0   8030004   0% /proc/acpi
tmpfs            8030004        0   8030004   0% /proc/scsi
tmpfs            8030004        0   8030004   0% /sys/firmware

It's very possible that a CI run could here could require another 5GB of data written to disk, summed across the following:

  • new container images pulled (sometimes necessary)
  • installing dependencies (conda install, apt-get install, etc.)

Saw that 20.3 GB of that 26.0 GB was devoted to docker images... many of which were old and unused.

---- docker system df ----
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          12        1         20.38GB   20.3GB (99%)
Containers      1         1         334.6kB   0B (0%)
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B
---- docker images ----
REPOSITORY            TAG                          IMAGE ID       CREATED         SIZE
ubuntu                22.04                        53a843653cbc   2 weeks ago     77.9MB
lightgbm/vsts-agent   manylinux_2_28_x86_64        0b9ea2c8701b   3 weeks ago     3.78GB
lightgbm/vsts-agent   <none>                       a4be478aed94   4 weeks ago     3.81GB
lightgbm/vsts-agent   <none>                       f2e0fcd21471   7 weeks ago     3.91GB
lightgbm/vsts-agent   manylinux_2_28_x86_64-swig   79f675a79726   7 weeks ago     3.91GB
lightgbm/vsts-agent   <none>                       7b2df634255f   7 weeks ago     3.91GB
ubuntu                <none>                       8a3cdc4d1ad3   2 months ago    77.9MB
ubuntu                <none>                       67c845845b7d   3 months ago    77.9MB
ubuntu                <none>                       52882761a72a   4 months ago    77.9MB
ubuntu                <none>                       437ec753bef3   4 months ago    77.9MB
ubuntu                <none>                       7af9ba4f0a47   4 months ago    77.9MB
lightgbm/vsts-agent   <none>                       c03a85cc829d   10 months ago   3.83GB

Ran the following:

docker run \
    --all \
    --force \
    --filter until=720h

And saw 16.5 GB of disk space cleared 😁

Total reclaimed space: 16.52GB
full logs (click me)
Deleted Images:
untagged: ubuntu@sha256:a6d2b38300ce017add71440577d5b0a90460d0e57fd7aec21dd0d1b0761bbfb2
deleted: sha256:52882761a72a60649edff9a2478835325d084fb640ea32a975e29e12a012025f
deleted: sha256:629ca62fb7c791374ce57626d6b8b62c76378be091a0daf1a60d32700b49add7
untagged: lightgbm/vsts-agent@sha256:d0cff3d693d79e5e8e4e8727e68f0e52f55ab59956e9aa87456db3a65fae2fdd
deleted: sha256:f2e0fcd21471fd04c243da076ee2d40894a61fccda8b11f375b6db1452a9c3df
deleted: sha256:193b1879745870f45b2ab5046d7b891e0f91f13b8da3690f8fca1f3a0260b8b0
deleted: sha256:1456972cc380cbbc812bbc76aff89dbc4ad1023d53465ad2976496008af14bd8
deleted: sha256:9aa72e80c34a929e35bf66ed166db7ab3542a61cc63f96b905dfdfb96de904b1
deleted: sha256:7c3882ad87cdee93715bb14f0e0e11b2fe9909c2adf55221cc8e9f81806711c1
deleted: sha256:0b8bb332d23afbc27cc9a948f2eb103143eac3929a9fa804633c33e34c027ba3
deleted: sha256:b36497155708e4dbcfd3a528febb86a2c987a810818f777caccd664e44e0675c
deleted: sha256:b190e811edf8aa4bc775911c819011fbcdf8f9a64a503040d64b456e8171a67d
deleted: sha256:0daf8ade905166c67f94546e16d232761d1c40b4edc97eb1f74ee37d163c1efc
deleted: sha256:acd1574a7aeff8c1912287d977402bbfd49d0ec76fcfbdc14201c8aa844aa1bc
deleted: sha256:db00de713736ec2bb81540d2a8699f3af6e8654ce42d9586d054faa3f17bd79e
untagged: ubuntu@sha256:6d7b5d3317a71adb5e175640150e44b8b9a9401a7dd394f44840626aff9fa94d
deleted: sha256:437ec753bef37f94b7426428201feac1ed99190eacafdab558c4f626808cde04
deleted: sha256:b706c187b212a5c2242e664f21d3eb12fee4c1e150b300d12035284d53c56b7a
untagged: ubuntu@sha256:340d9b015b194dc6e2a13938944e0d016e57b9679963fdeb9ce021daac430221
deleted: sha256:8a3cdc4d1ad3e314a91f76b7b99eed443f2152e3a9bf33e46669b31d094be443
deleted: sha256:931b7ff0cb6f494b27d31a4cbec3efe62ac54676add9c7469560302f1541ecaf
untagged: ubuntu@sha256:19478ce7fc2ffbce89df29fea5725a8d12e57de52eb9ea570890dc5852aac1ac
deleted: sha256:67c845845b7de8024a1ad9f6e7fd08964502a0b423aa8de631ef521863873884
deleted: sha256:0b9c994b0484c0bc61f9de7c28a58745a504704254c5e8ed12349ebee3393a66
untagged: lightgbm/vsts-agent@sha256:9c486a2f02e6b3951d9eef16036495dcea11f3812ccc258cd5f78cfeef101077
deleted: sha256:a4be478aed946c5207d6c228b20f8e6631a3198d3e977d6d6aaeb6dae25795ae
deleted: sha256:67cf2658a2ed1ddfbf0b8a619b3f47883ee9b4f8450d2ca43b43b42f12d3a2f9
deleted: sha256:a830a15fc3400ccc285cef3b061cf36c6ae115ce4df63418574ceb76492bdda4
deleted: sha256:328f1ea7d8fdc38f5564caa2312e6305a05cf1ae89ad203fe13303f49c862368
deleted: sha256:3aeb62a3e3c6a84cb263de65c8cd83d288d629124be0d1a3f1c0dd4c604003d3
deleted: sha256:65c8d2fe0d8f3ddabc86bb23e3cfb7487128463fbdb6bd5b305069328da22991
deleted: sha256:f54736991dbb5e8827d93a7c6fa4224b79610f1957bf1a2e9318c7a7e190db51
deleted: sha256:dfa8bc08a5a51c236472e43a005aa27fddd1223ca44e121ff4e8a615af63926b
deleted: sha256:4f6faa26ed7a82737793191b77aefe839a4209839407eb227f69b2ffdd279d28
deleted: sha256:ec335644fcbb289b2bcb0fa0bc2c3ddbd4e9c2b1232338349f08705f96f759b6
deleted: sha256:345589a62fed2cb9b564272ed65ae2047055acef4e0d2aa4752fdcc4c53a7812
deleted: sha256:f60f4718a1bb08f4f8c09b9612bcd16b59fc513e3cfdfd2ec5994565ff116ea4
deleted: sha256:1756debe875d9ce1c0e12f66f89a564978752865fe23c0c28b93fbb54e59994c
deleted: sha256:bb52480c37e802fdbf4d9d00d9184526ec9a896aaf4c5a90d9159526ca146290
deleted: sha256:27cc7bea86c77db2d415a0e0677c53bde3485a7bd0d10f4dcb877e9e289663eb
deleted: sha256:da03f1269837c04676b071e659da82e1eee57a4fce0d305b5c02e070b78f52ca
deleted: sha256:511f136efadc1870bc5394b6ada5edbe81cf6b1502c2b3ce3d923f3eb3cc44e5
deleted: sha256:a97c563229cff687f8f9f5f316d2dc37b5e043a95b851bb1ec21f63e4ef7d072
deleted: sha256:4d4bd7b8ea0efdeab24e39222f7ef89fd751690682a1633f82afcb6dac729d68
deleted: sha256:656c98f1b55821abecb15eebbb760f036f90f25a245f40a5d75f8e0d3ec5ec61
deleted: sha256:b4111b370b637c45d87b84c897686684213c5e536d3fba4bfee1400a62b2e8e3
deleted: sha256:61a2317e2ad63b01d4bc1a9312e01574b62fb1d819335aaec20c50eab00cf893
deleted: sha256:ee908442e34aeb06be257623b87ee5581da7e985c212a598a0c02a7490012149
deleted: sha256:dd616532e404284e53f1f76d942f76a43f43576625d14ba8d294b0d3938f6773
deleted: sha256:3e1cdee0166fbc30e29a958207ef705e8a7382df2a42ed05d7079fdbee7eaec4
deleted: sha256:ba8fc67f764232685e521cbbf1353de3e55de61c468adb20c936542033e6f337
deleted: sha256:9606810e7bc08cba591ae6f6914b0fd4bcf6193761626791244022cd55422f10
deleted: sha256:f00df1f32bd447d547e1fea046eece0499ca1050cb7ce80fceefe3b37f2eacf6
deleted: sha256:905a2d394a515fd5a0f948fd809743927c595e23ec92f14bd889f5a29fed5930
deleted: sha256:bf4a0a88ac2ef295e34c2a9d07f00f23db8c8c464934808aa0a3e286ec20831e
untagged: lightgbm/vsts-agent@sha256:f2bdc8b2a0076313460e969a675f25c16838aee45be69c4b91eefe1eef866e40
deleted: sha256:7b2df634255f5b27999487ee169f63bc505eeed2d9ecee3d6c0fd5db43e9076a
deleted: sha256:866a421971579bcbddf8cb479fab16d49153857bb01590f61502a91409b73cb2
deleted: sha256:7daf0e545001e51ce0c52dfe21dea0ff653eedd4c6001240676b3f1c1c40733a
deleted: sha256:ab985e38f779ec9045e1e1a1d1efc5e1e90f96e21ca139dcdfb63a9d9e7c9285
deleted: sha256:1c83947f8f3cd37c7678cff201acd66b951c6a81a29e22d6f8d1595138f758a4
deleted: sha256:e7a6b6bf6e5b699c6ba8aa396f59ec9ac28adf921bb5ba8ada5dc744d104d173
deleted: sha256:e2175a25c80655a69d7f1b391cd997398bfacfbe2bdf2a3f43755f7b7e31153b
deleted: sha256:1c98db0cbd6d508551a0204bcca32cd46b43966e8549eaed8254f2b96bcb63fa
deleted: sha256:1c68a89a34e52ce7c1a401baf5977dc7ed27e12f4f62a734696a379a45cfcbbc
deleted: sha256:e587d963e75fe3d4ab96d97e285ac751d9170852aa9c67f2730fdfee71da6f5d
deleted: sha256:a8e75cd43dbd7c4ad634c8129cd655747ab575f804563e2fcf2b5f8951a30302
untagged: ubuntu@sha256:1b8d8ff4777f36f19bfe73ee4df61e3a0b789caeff29caa019539ec7c9a57f95
deleted: sha256:7af9ba4f0a47d9bc8b1ffa492c6b0276476f1889cf4e699fba2236924e5932ed
deleted: sha256:e0a9f5911802534ba097660206feabeb0247a81e409029167b30e2e1f2803b57
untagged: lightgbm/vsts-agent:manylinux_2_28_x86_64-swig
untagged: lightgbm/vsts-agent@sha256:f912f7e7f49c51e5cf33e6068fecdeb034b805490ca6b44347c6908ed9a72417
deleted: sha256:79f675a79726a6d7ca4edbf54981d75324b18a1c1096a3bfeff3002f446a1ae2
deleted: sha256:834134b6d9e35ea328e87218193109c9ec96692d7e032581a6956d16b5b4a7cf
deleted: sha256:4543e3413447601be8fcc7680bb8662504416e84eee9f1fbbd7ec8abf99ae426
deleted: sha256:f153dec7aea1dcfe5e931444e1137a4bc429f9c36f6dcd6ef1deb68a108727f6
deleted: sha256:0930553ecd601db500fed78279acd922768f4dcc14ebc5d90499c6d13f2a0796
deleted: sha256:e1a92591ed2378f6120404181d010e5d9a6a101a88404a5a8ad08c7c4b7ca6e8
deleted: sha256:a7d9e8f30538a2317b1ca650721788f023a1def5a987c9fcd3d4f0b4f24e8069
deleted: sha256:7a6b2eb34a516922f352d1993b07d3b2c547698bbb90859da4d1c403bc6ce0ce
deleted: sha256:c42388c2a2153a76a6ba5c0e22916fa6c4bcca5bdeb74012e64f3f255de897ce
deleted: sha256:217bab594071ad5cd5c71e3e2833b92cb595a1a95c2b1feebb5aa2e21dce1da8
deleted: sha256:bc3bf3c11f4cc4aa84f0dbfa56548ea92fe40a373be875d35789dc2a762fd2c1
deleted: sha256:199deebea30b7c4f72acb3d07bbfce97c15dd47f739e01745481a8e630d346fd
deleted: sha256:4035ff6b9e721f0511cb98c541540e662bbdf8585bf45fd3ecd58bd14ec1dce8
deleted: sha256:d8dd92a37ffa1db67a6884345da852ee292a3b15e3fe8cc1bc6298f0b0b861c3
deleted: sha256:6faf3476e8de52b27b315414fa45a472fbc758a3aaf615b6cfcaba2aa8c69951
deleted: sha256:383708b35e8247806894363fd92e4d517e5a0acc51ba0afbbeb2fb6b9954ff6d
deleted: sha256:cc324f89ce21ead59865aceb7a163970f7654b303ec37d7fdc17b0f9bcaf6626
deleted: sha256:ec6b4fab933ce7046e26a78f87c85a03beb696b4b04e0b2e4a3c9769ab62d8ca
deleted: sha256:2c4b59499d567f62d616a0f311ed458957eaa5492edf7793d78d8af05e019c4b
deleted: sha256:c374b96c9154229fa09d0830dd8e24b223e46a84548a0be92f53520b31900687
deleted: sha256:5376313cb5b5068bd0299fbd902c248408befd091c8831986ce576e7c3da0e99
deleted: sha256:e1f97ab81b65cd4acddfa865b56865b642e33a270178d92f1ad670c880efd591
deleted: sha256:c32fe4156974180899bc5676bd3ad72b3ee2c0ada7dbed62a6de98e78f48b8ba
deleted: sha256:7d00b8d9aeea24a3c23bebda4ef1fc686fef4bcf4d02330f378c1c0a73de4fd5
deleted: sha256:db39409d1fb512093a360066294521408943ec32bf0869077b04cc0a547202d6
deleted: sha256:ce75168f5963d38c0f20ddd70df7cf8a077082da9e2ab09295074e8f62a60642
deleted: sha256:253511276d67b84d38d8e932aec1fb78b5aa6e496508733a713c6003167db59f
deleted: sha256:f32ced69142f8c05839159f5907bf07b82ea47b9a9a0219f1d59ddfcb8d67b39
deleted: sha256:46083a618a79c29676048c7d1a344767633a01d0ecfc1c6020963b12c4ee7dc0
deleted: sha256:a8e8f0c6528e43289f34e69b4922387b30810483e6a09f3ae735a8c6d27ff1c5
deleted: sha256:829f352483fb83972f195db26644eba70b8aab88866dabef0e79bc707e1ec69f
untagged: lightgbm/vsts-agent@sha256:f28d91684c34258fcd9b93bfe5710b974f80b6ada2872e06ce02cf1bb1b59bee
deleted: sha256:c03a85cc829d852377ceb77daa1267170de05c2e0ca7eefde37d71327e8817fe
deleted: sha256:5347161b9bbc6aac632096a4b375c4e030012139d7064749ff5673c36ae16937
deleted: sha256:cf5ed11b8e6f3ccf9682031a25884d46c85947ba5a9e73586a6936d1680e08e6
deleted: sha256:3c596e00c63e10f8029ed80856ebf8beae3e53b6db9c81cdb4d1c1fef2b94a13
deleted: sha256:bb13d4512ccd5d5d09c40687a7b8eb8685a24436b94edf2bfeb8c4c411e6b7c0
deleted: sha256:9910beb258ed2b9b2b90a9aa27a975727b34c8040816087c4cebff8a81b111a8
deleted: sha256:111ade623a343ed838c271852826dbd966e850f73c460a0c9ec45ed8f5432065
deleted: sha256:a7c8a6e22c92950347794d92d832b66033cab6d8fe15fbe9a5bd87853393477f
deleted: sha256:bbc5402bc03fdcd81921f92ea32ba23545390415e5a4fb682057f77856d028f4
deleted: sha256:5d9f27413c504981a9017f35e41273757e5a6385ef115781e92a6886a0916131
deleted: sha256:cb770b6119c70e2b40901d49774c8103c73c4adda7e01c06d8c48a2611bfe381
deleted: sha256:53aa84b6b5f684ad206e3a3603518e1b9034a7943dee1bab341951b219e86863
deleted: sha256:79b879aeaa18c44087b71e572fd0d1262745dbe782c4f4ef5aa931fe35e19897
deleted: sha256:98b9c7ae476b173f8ba52760d87b303ae92094dd2653262f16717bc7506834da
deleted: sha256:966c5703edd29354c5d85fd5b3e8153250c335ea62cdae9bfe77f16574ce46d0
deleted: sha256:7903536ac2d29b6f65eda87ecbf8d1e67023699a50d825efb6ca137949ec593b
deleted: sha256:fdf3f185b0cfe490cad23838bdbfb39a967bac1a74d133f3a5c7069915ff2722
deleted: sha256:2ed8a01a98f293a04d00ba3ed61e7773a8c54a4df3125ade159a9c24831e6a9d
deleted: sha256:cd810287e774904a670b2cb15442d39478251c2de9aec0aad00aac0b34d1b0c8
deleted: sha256:36d1ab1fee044c560d4b7b5aeb41871f85a4fa2c92157a8d234e90fd1583be9e
deleted: sha256:3f5317e23b08e4d8229de7a1f88a248b7c06da16098ad074192aad3de2d0f39a
deleted: sha256:7d2ab59c9152127a2080cf489a0f7bec177864c25739ad6f05c13ce62f9a3dd5
deleted: sha256:a9347cb9244bacbe97023e88e1e2a476228b630607712d0bf12bf7fcd4f968eb
deleted: sha256:33cef77a86171a9a126bab02b1cf5314100aefdd27add76b15f6b4a3be1716ed
deleted: sha256:d8bb92039c7e54ecf231eeb6fe5fcb4f273b3b76ce5a44ff222f1e228051dadf
deleted: sha256:2c49a82ebbc20f04aa583cd92a36094313c6fb1fa53043b0928bd4e6dfda5dca
deleted: sha256:da80b2078328e44baea2fcf5aae3170726bc214813a78eacdcdbe04a80351a5c
deleted: sha256:556c66af0ffa2809940986b2c2755150e40f3cc1e8f3d1af9b6b628dd51b10c5
deleted: sha256:b787a749e844c44f15ec732a325c601e314ffc5a5326e5b2919d815e3ec9e717
deleted: sha256:ac3d171fd1daf11a00cda006c7355b13bf821b542822f91f8e96ccaef74af5ca
deleted: sha256:51af9e44b4eaeac885d65d3f63d6b1a81bacdc0846f917ab8ad50bfe0848fe55

Total reclaimed space: 16.52GB

Ran those commands again and saw disk usage had fallen to just 30% used.

---- df ----
Filesystem     1K-blocks    Used Available Use% Mounted on
overlay         32370556 9061792  21929856  30% /
tmpfs              65536       0     65536   0% /dev
tmpfs            8030004       0   8030004   0% /sys/fs/cgroup
shm                65536       0     65536   0% /dev/shm
/dev/sda3       32370556 9061792  21929856  30% /__t
tmpfs            3212004    8924   3203080   1% /run/docker.sock
tmpfs            8030004       0   8030004   0% /proc/acpi
tmpfs            8030004       0   8030004   0% /proc/scsi
tmpfs            8030004       0   8030004   0% /sys/firmware
---- docker system df ----
TYPE            TOTAL     ACTIVE    SIZE      RECLAIMABLE
Images          2         1         3.862GB   3.784GB (97%)
Containers      1         1         334.6kB   0B (0%)
Local Volumes   0         0         0B        0B
Build Cache     0         0         0B        0B
---- docker images ----
REPOSITORY            TAG                     IMAGE ID       CREATED       SIZE
ubuntu                22.04                   53a843653cbc   2 weeks ago   77.9MB
lightgbm/vsts-agent   manylinux_2_28_x86_64   0b9ea2c8701b   3 weeks ago   3.78GB

build link: https://dev.azure.com/lightgbm-ci/lightgbm-ci/_build/results?buildId=16976&view=logs&j=109b04a8-e507-5fc5-3d1a-042c6dd15986&t=6221f19d-281f-5be7-2f83-a42abd4fa0df

I think it'd help to routinely clean out old docker images. And that that should be done via CI configs instead of something like a manually-configured cron job on the images... so that all maintainers (most of whom don't have direct access to the runners) can modify the details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant