Skip to content
This repository has been archived by the owner on Sep 9, 2020. It is now read-only.

cmd.CombinedOutput race condition #1194

Closed
jmank88 opened this issue Sep 21, 2017 · 6 comments
Closed

cmd.CombinedOutput race condition #1194

jmank88 opened this issue Sep 21, 2017 · 6 comments

Comments

@jmank88
Copy link
Collaborator

jmank88 commented Sep 21, 2017

A PR CI job failed due to a preexisting race: https://travis-ci.org/golang/dep/jobs/278159807

It appears to be cmd.CombinedOutput related.

I can circle back to this, but pinging @tamird since he touched that code recently via: #1180

@jmank88
Copy link
Collaborator Author

jmank88 commented Sep 21, 2017

Ugh, restarting that job gave me a #2270.2 but replaced 278159807, so now I can't get back to the #2270.1 log.

@tamird
Copy link
Contributor

tamird commented Sep 21, 2017 via email

@tamird
Copy link
Contributor

tamird commented Sep 21, 2017

So this is a real bummer - without at least knowing which test failed, it's hard to stress this in a reasonable way. Some of these tests manipulate the file system, and that's just slow and generally no fun.

@jmank88
Copy link
Collaborator Author

jmank88 commented Sep 22, 2017

Phooey. From what I can remember, it was regarding the CombinedOutput goroutine that uses time.AfterFunc. Could it have just been racing to set t? What do you think about this simplification?

-var t *time.Timer
-defer func() {
-	if t != nil {
-		t.Stop()
-	}
-}()
// Adapted from (*os/exec.Cmd).Start
waitDone := make(chan struct{})
defer close(waitDone)
go func() {
	select {
	case <-c.ctx.Done():
		if err := c.Cmd.Process.Signal(os.Interrupt); err != nil {
			// If an error comes back from attempting to signal, proceed
			// immediately to hard kill.
			c.cancel()
		} else {
-			t = time.AfterFunc(time.Minute, c.cancel)
+			t := time.NewTimer(time.Minute)
+			select {
+			case <-t.C:
+				c.cancel()
+			case <-waitDone:
+				t.Stop()
+			}
		}
	case <-waitDone:
	}
}()

if err := c.Cmd.Wait(); err != nil {
	return nil, err
}
return b.Bytes(), nil

@tamird
Copy link
Contributor

tamird commented Sep 22, 2017

Ah, that's definitely a race! That simplification is close to what I'd write, but I'd still use AfterFunc; I'll send a PR.

@jmank88
Copy link
Collaborator Author

jmank88 commented Sep 22, 2017

The only other time.AfterFunc is in (sm *SourceMgr) HandleSignals. This seems more likely to be the problem. IIRC the stack indicated that the triggered func body was racing, and that first one is only calling a cancelFunc.

zknill pushed a commit to zknill/dep that referenced this issue Oct 6, 2017
Instead, we use the existing `waitDone` channel to wait for the
command to return and then clean up the timer unconditionally. This
may hold the goroutine open for longer than strictly necessary, but
that seems harmless.

Closes golang#1194.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants