-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Research spike: investigate solutions around pulling images that exceed VCH tmpfs size #3624
Comments
Bumping to high - the fact that we can shutdown the endpointVM simply by pulling a large image from a fast repo is going to be compounded with Harbor deployments and large images are more likely to occur in enterprise than elsewhere. |
doing the priority dance, by definition a research spike is not high |
This is blocking the 10th most popular image on docker hub: And we have a customer that is actively trying to use VIC for this image as well. |
Hello so this is a huge blocker for me at this very moment, I can't launch anything because somehow pulling a 17MB image filled up the /tmp directory, and all 14 images on this VCH are only 486MB in size. I tried to reboot the VCH to see if it clear up the /tmp but there is no way, vCenter won't let me do a guest OS reboot, the document says nothing about rebooting a VCH. So I'm willing to wait 2 more years for this to be fixed but can somebody tell me how to reboot a VCH without killing off all running containers? |
@fRzzy You can just powercycle the endpointVM ( The containers will continue running while the endpoint is down and can continue talking to one another and via container-networks if you're using them. If you're using container-networks for the container data paths then there should be zero impact. If you're using NAT port forwarding that will be disrupted until the endpointVM has rebooted as will container name resolution. When the endpointVM has restarted the port forwarding will be re-established, however if you used randomly selected ports for forwarding those may change. If you were explicit in the port forwarding then you'll get the same mapping. For completeness, the Docker API will also be unavailable until the endpoint restarts. If you're using DHCP the endpoint will attempt to reacquire the same lease it had previously but that's not a guarantee. Once Reboot time is variable based on number of images that need to be re-indexed and number of containers running but with only 14 images I'd guess at under a minute (although datastore speed causes significant variance). |
Hello, I'm still experiencing this issue in vic 1.5 when trying to pull the When observing the disk usage on the vch host in question I observe the root partition (rootfs / ) filling untill the "no space left on device" error is thrown. One of the layers is 633.6MB which exceeds the available diskspace (538M) on the root partition of the vch host.
|
@consummo You can shutdown the VCH guest os and use "edit settings..." from vsphere web client to reset the VM's memory size to a larger value, after powering on VCH, you will get more spaces on rootfs(half size of memory). |
The best practice is by following the above steps to resize VCH Vm's memory size. |
When pulling an image that exceeds the available tmpfs space on the VCH, a "no space left on device" error occurs as the
/tmp
partition becomes full.Currently, we have two solutions in mind:
Implement a shared buffer on the portlayer side that will read up to a certain watermark, block downloads, drain to disk, repeat.
Implement a large ephemeral disk to use as temporary space instead of using tmpfs on the VCH. This disk should be created at VCH provision time, must be large enough (10s of GB) to support multiple concurrent large image pulls, and must be able to resize (shrink) itself on VCH restart.
We need to research the complexity and correctness of both solutions, with a preference for correctness over less complex.
bug2157676
The text was updated successfully, but these errors were encountered: