Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: process start time mismatch #1393

Closed
crosbymichael opened this issue Mar 27, 2017 · 14 comments
Closed

bug: process start time mismatch #1393

crosbymichael opened this issue Mar 27, 2017 · 14 comments

Comments

@crosbymichael
Copy link
Member

Report from a docker issue and was able to have another user reproduce this error.

moby/moby#29794

I added some code to test the exact place that is causing this to confirm that on a centos system, the start time that runc records vs additional stat calls are not matching up.

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index 82c6d8e..8bd0edf 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -1132,7 +1132,7 @@ func (c *linuxContainer) doesInitProcessExist(initPid int) (bool, error) {
 		return false, newSystemErrorWithCausef(err, "getting init process %d start time", initPid)
 	}
 	if c.initProcessStartTime != startTime {
-		return false, nil
+		return false, newSystemError(fmt.Errorf("process start time and recorded start time do not match"))
 	}
 	return true, nil
 }
@@ -1186,11 +1186,15 @@ func (c *linuxContainer) currentState() (*State, error) {
 	var (
 		startTime           string
 		externalDescriptors []string
+		err                 error
 		pid                 = -1
 	)
 	if c.initProcess != nil {
 		pid = c.initProcess.pid()
-		startTime, _ = c.initProcess.startTime()
+		startTime, err = c.initProcess.startTime()
+		if err != nil {
+			return nil, err
+		}
 		externalDescriptors = c.initProcess.externalDescriptors()
 	}
 	state := &State{
diff --git a/libcontainer/system/proc_test.go b/libcontainer/system/proc_test.go
index ba91029..864a458 100644
--- a/libcontainer/system/proc_test.go
+++ b/libcontainer/system/proc_test.go
@@ -7,6 +7,7 @@ func TestParseStartTime(t *testing.T) {
 		"4902 (gunicorn: maste) S 4885 4902 4902 0 -1 4194560 29683 29929 61 83 78 16 96 17 20 0 1 0 9126532 52965376 1903 18446744073709551615 4194304 7461796 140733928751520 140733928698072 139816984959091 0 0 16781312 137447943 1 0 0 17 3 0 0 9 0 0 9559488 10071156 33050624 140733928758775 140733928758945 140733928758945 140733928759264 0": "9126532",
 		"9534 (cat) R 9323 9534 9323 34828 9534 4194304 95 0 0 0 0 0 0 0 20 0 1 0 9214966 7626752 168 18446744073709551615 4194304 4240332 140732237651568 140732237650920 140570710391216 0 0 0 0 0 0 0 17 1 0 0 0 0 0 6340112 6341364 21553152 140732237653865 140732237653885 140732237653885 140732237656047 0":                                      "9214966",
 		"24767 (irq/44-mei_me) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 -51 0 1 0 8722075 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 1 50 1 0 0 0 0 0 0 0 0 0 0 0":                                                                                                                                                                         "8722075",
+		"22997 (rethinkdb) S 22986 22997 22997 0 -1 1077944320 13867 438 8 0 20 7 0 0 20 0 69 0 27005578 100900864 8449 18446744073709551615 1 1 0 0 0 0 0 4096 17474 18446744073709551615 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0":                                                                                                                           "27005578",
 	}
 	for line, startTime := range data {
 		st, err := parseStartTime(line)

The error that we are seeing with my patch is exec failed: process start time and recorded start time do not match

How is this possible? Why centos? Why!

uname -a
Linux ip-10-0-0-86 3.10.0-327.10.1.el7.x86_64 #1 SMP Tue Feb 16 17:03:50 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

I've been trying to debug this but no clue what is going on. Does anyone else have any ideas how this mismatch can be possible? It looks like this was the same type of error that was fixed in #1136 but i have confirmed we are running with a build that contains that commit.

@crosbymichael
Copy link
Member Author

Selinux is set to Enforcing but not seeing any EPERM or anything in the audit logs related

@rhatdan
Copy link
Contributor

rhatdan commented Mar 27, 2017

I doubt it is an SELinux issue.

@crosbymichael
Copy link
Member Author

@rhatdan ya, me too. Was just adding it as a data point

@cyphar
Copy link
Member

cyphar commented Mar 27, 2017

It looks like the CentOS kernel is missing this kernel patch: torvalds/linux@266b7a0. Basically when a multithreaded process does an execve(2) the process that actually is scheduled is random and thus the tsk->start_time and tsk->real_start_time need to be modified to match the old threadgroup leader. /proc/<pid>/stat shows us real_start_time but it appears that real_start_time was not updated until the linked patch in 2013 (it was broken in 2007).

@crosbymichael
Copy link
Member Author

I'll submit my changes to make the error messages return from runc more explicit

@cyphar
Copy link
Member

cyphar commented Mar 28, 2017

My worry with this patch is that it converts our silent pid reuse detection into an error condition, which means that if you have a lot of process churn and containers being started and stopped then you'll start getting errors when runC would be able to take care of it.

Maybe we should just log to stderr rather than actually bail in this case?

@crosbymichael
Copy link
Member Author

@cyphar its always been an error condition hence why it was reported.

How can we take care of it? The only alternative is to remove the check and just perform the actions.

@cyphar
Copy link
Member

cyphar commented Mar 29, 2017

@crosbymichael But it's not an error condition (unless things are broken as we've seen in this issue). We check the start time to avoid PID recycling attacks, to make it so that if a container's init dies then the next time we run a runC command then the state will be correctly updated if a container's init died and a new process started with the same PID (which requires us making sure that the start time is the same). But if we make a different start time an error in doesInitProcessExist then that won't work anymore AFAICS.

To be honest, I'm not really sure how we can deal with kernels where the start time is wrong, but since it's a kernel bug that was fixed in 2013. Maybe we could come up with another way of disambiguating processes but the kernel actually makes a guarantee that start_time will always be the same no matter how many times the process execs.

@dqminh
Copy link
Contributor

dqminh commented Mar 29, 2017

i agree with @cyphar that this is not an error condition when we detect that the PID has been recycled. We will just mark the container state as stopped and continue in that case.

The error here comes from the fact that the kernel is buggy. Either detect that the system is running on a buggy kernel so we can bail early, or find another way to disambiguate processes would be nice, but so far i don't have any nice clue how to achieve either of them in runc.

@crosbymichael
Copy link
Member Author

@dqminh the problem is that when we mark the container as stopped you cannot do certain operations and that is what fails. you cannot exec a stopped container, etc.

@crosbymichael
Copy link
Member Author

@cyphar @dqminh So do you think we should just remove the check as it does block some operations or should we keep it as is today and suggest that people upgrade their kernel?

Is there a better way to check that the pid is our pid or is this not really an issue?

@cyphar
Copy link
Member

cyphar commented Apr 16, 2017

So do you think we should just remove the check as it does block some operations or should we keep it as is today and suggest that people upgrade their kernel?

People should upgrade their kernels IMO -- this is a buggy kernel issue that was fixed several years ago.

Is there a better way to check that the pid is our pid or is this not really an issue?

My proposal in #1224 is another possible way of doing it, but it has the downside of meaning you have to futz with the host's mount table. And it will break rootless. IMO the "right way" is the way we're doing it now -- unfortunately there was a kernel bug in the past with this technique but it was fixed a while ago.

@evantorrie
Copy link

What version of the Centos kernel were you testing?

It appears that this kernel fix referred to in #1393 (comment) is in the 3.10.0-514.16.1 version released as part of Centos 7.3.1611

@cyphar
Copy link
Member

cyphar commented May 25, 2017

@evantorrie At the time when I wrote that comment I was checking the latest sources from CentOS. I talked to some RedHat folks and they patched their kernel so it should work now (AFAIK). Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants