Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runc fails to start: Container with id exists upon running runc #39

Closed
rajasec opened this issue Jun 26, 2015 · 14 comments
Closed

runc fails to start: Container with id exists upon running runc #39

rajasec opened this issue Jun 26, 2015 · 14 comments

Comments

@rajasec
Copy link
Contributor

rajasec commented Jun 26, 2015

I was able to run runc successfully..

Later after exit of the container, I could not launch any of my container image ( eventhough I had rootfs and container.json properly placed)

Code: Id already in use
Message: Container with id exists: runc

I've ensure that no runc process was running earlier.
ps -eaf |grep -i runc
root 10077 9934 0 21:53 pts/31 00:00:00 grep --color=auto -i runc

Thanks
Rajasec

@rajasec
Copy link
Contributor Author

rajasec commented Jun 26, 2015

Steps to reproduce the problem
$HOME/go/runc.
This spawned the new container and entered shell..

For testing purpose whether killing runc will kill the container or not, I went and killed runc process
kill -9 < pid of runc >

Post that, I could not launch any container using runc command

@crosbymichael
Copy link
Member

If you SIGKILL runc, it is not able to cleanup it's state on disk and thinks that there is already a container running. So you have to manually remove the directory in /var/run/ocf/<container> to start it again.

@rajasec
Copy link
Contributor Author

rajasec commented Jun 27, 2015

Thanks, it worked..

@CameronNemo
Copy link

Maybe actual file locks should be used so they are automatically unlocked when the process dies.

@rajasec
Copy link
Contributor Author

rajasec commented Jul 3, 2015

@crosbymichael

is the below functionality need to be implemented part of runc for cleaning up its state. Below was handled by execdriver in docker
func (d *driver) cleanContainer(id string) error {
d.Lock()
delete(d.activeContainers, id)
d.Unlock()
return os.RemoveAll(filepath.Join(d.root, id))
}

@rajasec
Copy link
Contributor Author

rajasec commented Jul 4, 2015

In runc, the destroy function in run.go is not getting called because of SIGKILL.
defer destroy(container)
As signal can not be catched, some mechanism clean up it state in future

Rajasec

@rajasec
Copy link
Contributor Author

rajasec commented Jul 4, 2015

before the runc starts
Do we need to get similar fix for runc "Cleanup libcontainer container state #11507 "

@crosbymichael
Copy link
Member

Maybe we need to implement runc kill or runc signal so that it can send a SIGKILL to the container process and still successfully cleanup state after it dies.

@rajasec
Copy link
Contributor Author

rajasec commented Jul 12, 2015

@crosbymichael

To test whether signals reaching inside the container, I wrote small shell script to catch the signals using trap
handler for SIGINT SIGHUP SIGTERM SIGQUIT SIGKILL

Started the container and my container process waited for signal which is expected.
root 4183 2555 0 17:46 pts/1 00:00:00 ./runc
root 4188 4183 0 17:46 pts/20 00:00:00 /bin/sh /bin/shelltrap

From the host, I've sent the signal to my container process from Host.
kill -1 4188
My container process is able to receive this signal ( SIGHUP)

After that I've done kill -9 4183 ( pid of runc)
Looks like the signal is not flown down to container process( child processes). I thought that killing the parent process using SIGKILL will send the signal to Child process ( looks like it forcefully killed the container too)

~Rajasec

@jhjeong-kr
Copy link
Contributor

I know that current golang cannot catch SIGKILL, so nobody can install the handler.
Therefore, I propose an alternative.

Proposal

  1. on creation
    check the container path, [/var/run/ocf/
    ....... if not exist
    .................> create a file "task" in [/var/run/ocf/] which contains pid of the container.
    .................> continue on creating steps
    ....... if exist -> read "task" and extract pid, and then check the process is alive
    ................. if not exist -> yield a user message: "sudo rm -rf /var/run/ocf/
    ................. if exist -> yield a user message: "container is already running"
  2. on destroying
    delete "task"
    delete path

@CameronNemo
Copy link

@jhjeong-kr runc should just delete the directory itself if the PID is dead and continue (maybe with a warning). That is standard practice with PID files.

@jhjeong-kr
Copy link
Contributor

@CameronNemo Yes, right, but let's suppose the moment that runc finds the lock path(/var/run/ocf/). Current runc just says a fatal message which is not user-friendly. Also the message(Id is in use) is not true in case that runc was terminated by SIGKILL. Therefore, we have to distinguish between abnormal termination(such as SIGKILL) and "in use" if runc finds the lock path.

jhjeong-kr added a commit to jhjeong-kr/runc that referenced this issue Jul 14, 2015
…cf/container is exist, and 2) yield user-friendly msg

Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
jhjeong-kr added a commit to jhjeong-kr/runc that referenced this issue Jul 14, 2015
	runc always shows "container in use" if /var/run/ocf/container exists
	However, there are two cases
		1) case 1: "container in use"
		2) case 2: /var/run/ocf/container still exists after runc was terminated by SIGKILL or abnormal crash
	For case 2, runc should yield "delete the lock dir" instead of "container in use"
	This patch is for this issue using "pid" file in /var/run/ocf/container/task

Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
jhjeong-kr added a commit to jhjeong-kr/runc that referenced this issue Jul 16, 2015
	runc always shows "container in use" if /var/run/ocf/container exists
	However, there are two cases
		1) case 1: "container in use"
		2) case 2: /var/run/ocf/container still exists after runc was terminated by SIGKILL or abnormal crash
	For case 2, runc should yield "delete the lock dir" instead of "container in use"
	This patch is for this issue using "pid" file in /var/run/ocf/container/task

Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>

    patch for issue opencontainers#39:
        runc always shows "container in use" if /var/run/ocf/container exists
        However, there are two cases
                1) case 1: "container in use"
                2) case 2: /var/run/ocf/container still exists after runc was terminated by SIGKILL or abnormal crash
        For case 2, runc should yield "delete the lock dir" instead of "container in use"
        This patch is for this issue using "pid" file in /var/run/ocf/container/task
	Also, I refined indentation and added code for slice length check

Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
@rajasec
Copy link
Contributor Author

rajasec commented Jul 18, 2015

@crosbymichael
As per your previous comment, I've generated pull request last week for runc kill which will send signal to the containers.

jhjeong-kr added a commit to jhjeong-kr/runc that referenced this issue Jul 19, 2015
     runc always shows "container in use" if /var/run/ocf/container exists
     However, there are two cases
         1) case 1: "container in use"
         2) case 2: /var/run/ocf/container still exists after runc was terminated by SIGKILL or abnormal crash
     For case 2, runc should yield "delete the lock dir" instead of "container in use"
     This patch is for this issue using "pid" file in /var/run/ocf/container/task
     minor revision opencontainers#1: error indentation, slice length check for "exeName"
     minor revision opencontainers#2: use "filepath.join" instead of "fmt.Sprintf"

Signed-off-by: Jin-Hwan Jeong <jhjeong.kr@gmail.com>
@mrunalp
Copy link
Contributor

mrunalp commented Aug 18, 2015

Closing as kill support was added in #178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants