Skip to content
This repository has been archived by the owner on Jun 6, 2024. It is now read-only.

docker's data-root will lost on Azure Node restart #2286

Closed
ydye opened this issue Mar 8, 2019 · 9 comments
Closed

docker's data-root will lost on Azure Node restart #2286

ydye opened this issue Mar 8, 2019 · 9 comments

Comments

@ydye
Copy link
Contributor

ydye commented Mar 8, 2019

Organization Name: OpenPAI

Impact
Currently, PAI is using temp path, but if Azure restarts, temp data will lost. PAI can use other path as the temp path instead, while on the other side OS disk is too small to be used.

Short summary about the issue/question:

If you configure docker ```data-root``` to a tmp path and data on the tmp path is lost, OpenPAI will crushed.
What's more, kubelet won't restart due to the loss of docker's ```data-root```. 

https://forums.docker.com/t/unable-to-restart-the-container-when-the-data-of-data-root-is-lost/70581

Brief what process you are following:

How to reproduce it:

sudo systemctl stop docker
sudo rm -rf /path/to/docker/data/root
sudo systemctl start docker

OpenPAI Environment:

  • OpenPAI version:
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Hardware (e.g. core number, memory size, storage size, GPU type etc.):
  • Others:

Anything else we need to know:

@scarlett2018 scarlett2018 changed the title Don't set docker's data-root to a temp path. docker's data-root will lost on Azure Node restart Apr 3, 2019
@fanyangCS
Copy link
Contributor

@ydye, do you have time to fix this?

@hzy46 hzy46 mentioned this issue Jul 26, 2019
44 tasks
@ydye
Copy link
Contributor Author

ydye commented Jul 26, 2019

@fanyangCS If you accept the solution to start kubelet by systemd, instead of docker.

@fanyangCS
Copy link
Contributor

fanyangCS commented Jul 26, 2019

@ydye, I agree with your solution. Please make sure user can upgrade smoothly from old to new solution.

@ydye
Copy link
Contributor Author

ydye commented Jul 26, 2019

@fanyangCS Obviously, it will be a broken upgrade. User will have to stop all service and restart all k8s node.

@fanyangCS
Copy link
Contributor

That’s ok.

@scarlett2018
Copy link
Member

@ydye - may you provide an estimation for this task? we are considering putting it in Aug release.

@ydye
Copy link
Contributor Author

ydye commented Aug 2, 2019

#3307

@ydye
Copy link
Contributor Author

ydye commented Aug 5, 2019

Code Merged.

@ydye ydye closed this as completed Aug 5, 2019
@fanyangCS
Copy link
Contributor

could you deploy it to the INT bed?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants