Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read all data from pipe in StartInitialization #508

Closed

Conversation

csfrancis
Copy link

This reads all data from the parent pipe before closing. Long story short is the Go JSON encoder adds a newline at the end of its generated JSON blob. If the length of the incoming JSON minus the newline is exactly 512, 512+1024, 512+1024+2048, 512+1024+2048+4096, ... bytes, the final newline will not be read and will remain in the pipe. This causes the parent read to fail with an ECONNRESET because there is still data to be read in the pipe.

IMO this is really just a workaround - the real fix would be to have the Go JSON decoder read the final newline.

See moby/moby#14203 (comment) for more debugging details.

Signed-off-by: Scott Francis <scott.francis@shopify.com>
@csfrancis csfrancis force-pushed the fix_rare_econnreset branch from bf98a79 to 34abd45 Compare January 23, 2016 19:28
Signed-off-by: Scott Francis <scott.francis@shopify.com>
@cyphar
Copy link
Member

cyphar commented Jan 25, 2016

@csfrancis You may find that the new pids cgroup changes (#446) may have changed your test cases, since the initialisation is slightly more complicated (we have to apply cgroups as late as possible).

@hqhq
Copy link
Contributor

hqhq commented Jan 25, 2016

@csfrancis Can you test moby/moby#14203 (comment) ?
You PR seems a bit complicated for just cleaning up the pipe.

@csfrancis
Copy link
Author

@hqhq I had initially tried the approach you mentioned, but it didn't work. I don't have the strace logs handy, but I believe the pipe was being closed before the defer function was called.

The PR is complicated because of problems with the integration tests. My initial fix was just to add a ioutil.ReadAll(pipe). But one of the tests, TestCheckpoint, ended up blocking and eventually timing out. I suspect because the pipe wasn't closed on the other side and is in blocking mode. I didn't want to put the pipe into non-blocking mode, so the next best thing IMO was to use select to determine if there was any unread data in the pipe.

I'd really love to have an integration test for this, but I don't know the codebase well enough to create one. In order to reproduce the original problem, you'd need initProcess.start to send exactly 1537, 3585, or 7681 bytes of serialized JSON across the pipe.

@cyphar
Copy link
Member

cyphar commented Jan 25, 2016

@csfrancis I'm fairly sure this is a Go stdlib bug (given the fact that the decoder doesn't require a newline to finish parsing, but outputs one when encoding).

I suspect because the pipe wasn't closed on the other side and is in blocking mode. I didn't want to put the pipe into non-blocking mode, so the next best thing IMO was to use select to determine if there was any unread data in the pipe.

I just checked the code in libcontainer/process_linux.go, and it looks like we do syscall.Shutdown(...) on the write end of the right side (no pun intended) of the pipe after all of the synchronisation is done. Doesn't that mean that ioutil.ReadAll should work (it'll read until EOF)?

As for the actual patch, I'm a bit confused where the number 256 comes from here:

+               syscall.Read(pipefd, make([]byte, 256))

I'm also not entirely convinced that we need to use select semantics for something as simple as this (mainly because the select syscall semantics are awful to work with in a language other than C). After the configuration is sent by the parent, the parent waits for the child to send some synchronisation stuff, there's a back-and-forth and the parent closes the write end of the pipe and waits for the process to finish. There's probably somewhere in there that's missing an ioutil.ReadAll.

I'd really love to have an integration test for this, but I don't know the codebase well enough to create one. In order to reproduce the original problem, you'd need initProcess.start to send exactly 1537, 3585, or 7681 bytes of serialized JSON across the pipe.

Thanks for this, I'm going to try to create a configuration that is independent from Docker that causes this bug. Then we can more quickly iterate on ways of solving it.

It would also be great if you could try using the master version of runC, which has had the above changes (I remember there being some weird things I fixed when doing the synchronisation stuff). But there are currently a bunch of issues using the master version of runC with Docker, which we're working out.

@csfrancis
Copy link
Author

I'm fairly sure this is a Go stdlib bug (given the fact that the decoder doesn't require a newline to finish parsing, but outputs one when encoding).

I agree 100% here.

I just checked the code in libcontainer/process_linux.go, and it looks like we do syscall.Shutdown(...) on the write end of the right side (no pun intended) of the pipe after all of the synchronisation is done. Doesn't that mean that ioutil.ReadAll should work (it'll read until EOF)?

The problem I had was that putting ioutil.ReadAll in the defer function was executed after the socket was closed (it didn't fix the problem). I unfortunately don't have the strace logs handy to confirm this - I'll work on reproducing this again. It doesn't matter that the write side calls shutdown. The problem is the read side closes the pipe without having read all of the data. Here's a snippet of Ruby that demonstrates the problem:

irb(main):002:0> require 'socket'
=> true
irb(main):003:0> a, b = UNIXSocket.pair
=> [#<UNIXSocket:fd 9>, #<UNIXSocket:fd 10>]
irb(main):004:0> a.write("foo")
=> 3
irb(main):005:0> a.shutdown(Socket::SHUT_WR)
=> 0
irb(main):006:0> b.recvfrom(2)
=> ["fo", ["AF_UNIX", ""]]
irb(main):007:0> b.close
=> nil
irb(main):008:0> a.read
Errno::ECONNRESET: Connection reset by peer
        from (irb):8:in `read'
        from (irb):8
        from /usr/lib/shopify-ruby/2.1.6-shopify2/bin/irb:11:in `<main>'

Like I said above, the only reason for the use of select was to get the tests to work. csfrancis@34abd45 works fine for us in production, but fails the runC integration tests.

It would also be great if you could try using the master version of runC, which has had the above changes (I remember there being some weird things I fixed when doing the synchronisation stuff).

Unfortunately, I don't really have time to do this right now.

Ultimately, I don't really mind if you guys come up with an alternate method of fixing this - my primary goal was to bring the issue to your attention. We've already forked Docker to use this fix, as this is causing us a ton of pain in production. I think once there is a reproducible test case (thanks for taking this on), it will be a lot easier to understand what's really going on and to come up with a fix that everyone can agree with.

@tiborvass
Copy link
Contributor

@csfrancis As far as I can tell, I don't think this is a Go bug, it's an expected behavior (but please correct me if you think I'm wrong). I think it is so that you can read a stream of JSON objects:

package main

import (
    "encoding/json"
    "fmt"
    "log"
    "strings"
)

func main() {
    r := strings.NewReader(`{"foo":"bar"}{"hello":"world"}`)
    d := json.NewDecoder(r)
    var x interface{}
    if err := d.Decode(&x); err != nil {
        log.Fatal(err)
    }
    fmt.Println(x)
    if err := d.Decode(&x); err != nil {
        log.Fatal(err)
    }
    fmt.Println(x)
}

outputs

map[foo:bar]
map[hello:world]

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 25, 2016

There is explanation in Go code why encoder puts newline:

    // Terminate each value with a newline.
    // This makes the output look a little nicer
    // when debugging, and some kind of space
    // is required if the encoded value was a number,
    // so that the reader knows there aren't more
    // digits coming.
    e.WriteByte('\n')

I think it might be enough to just replace Encode with Marshal + Write. WDYT, guys?

@dqminh
Copy link
Contributor

dqminh commented Jan 25, 2016

I think it might be enough to just replace Encode with Marshal + Write. WDYT, guys?

I think so. There's no reason to transmit that empty line.

@crosbymichael
Copy link
Member

This is a good start to fix the issue. We may need to add this ReadAll back but out of the if block as this is the case we are hitting.

103853e#diff-b67e6a36842983c6aa85e4560b03f7dcL235

@tiborvass
Copy link
Contributor

@crosbymichael isn't that what @hqhq suggested in moby/moby#14203 (comment) ? If that's the case, @csfrancis said that didn't fix their issue: #508 (comment)

@crosbymichael
Copy link
Member

@tiborvass yes but it did fix the issue. there was a second issue with the libcontainer tests we can look into and see why they are blocking. It will probably be best to look into the ReadAll as the fix and see what is wrong in the tests as the main execution workflow does not block.

@mrunalp
Copy link
Contributor

mrunalp commented Jan 25, 2016

Agree with @crosbymichael. Seems like the best option and hopefully reduces chances of introducing any new bugs.

@tiborvass
Copy link
Contributor

@crosbymichael @mrunalp is one of you able to create a testcase for this issue?

@crosbymichael
Copy link
Member

i'll look at it this afternoon

@LK4D4
Copy link
Contributor

LK4D4 commented Jan 25, 2016

That commit wasn't in 1.9.1 release of docker and can't be problem which @burke and @csfrancis saw.

@crosbymichael
Copy link
Member

@LK4D4 yes, the bug is in Go most likely but the fix would be to move the readall out of the if block and into the defer before the pipe is closed. It does not help that the change made things worse by removing the readall all together.

@csfrancis
Copy link
Author

Sorry, I've been away for most of the day.

@tiborvass yes but it did fix the issue.

No, this does not fix the issue. You cannot put the ioutil.ReadAll call in the defer function because it will never be called on success. The last thing func (*linuxStandardInit) Init() does is call execve, which will never return. This is why my fix clears the pipe before the call to Init().

The pipe itself is created with SOCK_CLOEXEC, which causes execve to close the file descriptor:

By default, file descriptors remain open across an execve(). File descriptors that are marked close-on-exec are closed; see the description of FD_CLOEXEC in fcntl(2).

@dqminh
Copy link
Contributor

dqminh commented Jan 26, 2016

csfrancis@34abd45 should have work ( the better place for it is probably right after we read from the pipe in newContainerInit though i think ), and it seems like a better fix. we probably just need to look at the test more closely.

@csfrancis
Copy link
Author

csfrancis/runc@34abd45 should have work

I've been taking a look at this this evening. I think the reason the tests time out is because of 103853e#diff-e6cd3268ea1b2c0ee41532976972b36fL280, which does not shutdown the pipe for writing. As a result, when the child does the ioutil.ReadAll(pipe), it blocks.

}

if fd_isset(rfds, pipefd) {
syscall.Read(pipefd, make([]byte, 256))
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really, this should probably just read 1 byte, since we're only potentially expecting to see 1 additional byte (the newline) in the pipe.

@crosbymichael
Copy link
Member

Ok, from looking at this today my fix is in #515. The basic idea is that the json.NewEncoder is the thing writing the \n to the pipe so lets not use that and just marshal the raw objects to the pipe without the chance for something to write a newline. This removes the need to have the complex read all logic to flush the pipe, if we only read what we write we are fine.

@csfrancis do you think you could test this out and see if it fixes your issues as well? I can rebase this on the docker branch if it makes it easier for you. Just let me know what you need from me and I'll help.

@csfrancis
Copy link
Author

If you can rebase this on a branch that's compatible with docker 1.9.1 that would be helpful. Otherwise I can hack your logic into my docker build. I'll try to work on this tomorrow.

@crosbymichael
Copy link
Member

@csfrancis here is the patch that you can apply to commit: f36b00a

This is bashed off the current branch.

diff --git a/libcontainer/container_linux.go b/libcontainer/container_linux.go
index de98e97..916511e 100644
--- a/libcontainer/container_linux.go
+++ b/libcontainer/container_linux.go
@@ -21,6 +21,7 @@ import (
    "github.com/opencontainers/runc/libcontainer/cgroups"
    "github.com/opencontainers/runc/libcontainer/configs"
    "github.com/opencontainers/runc/libcontainer/criurpc"
+   "github.com/opencontainers/runc/libcontainer/utils"
    "github.com/vishvananda/netlink/nl"
 )

@@ -968,7 +969,7 @@ func (c *linuxContainer) updateState(process parentProcess) error {
    }
    defer f.Close()
    os.Remove(filepath.Join(c.root, "checkpoint"))
-   return json.NewEncoder(f).Encode(state)
+   return utils.WriteJSON(f, state)
 }

 func (c *linuxContainer) currentStatus() (Status, error) {
diff --git a/libcontainer/factory_linux.go b/libcontainer/factory_linux.go
index 70513f7..d03ce86 100644
--- a/libcontainer/factory_linux.go
+++ b/libcontainer/factory_linux.go
@@ -5,7 +5,6 @@ package libcontainer
 import (
    "encoding/json"
    "fmt"
-   "io/ioutil"
    "os"
    "os/exec"
    "path/filepath"
@@ -19,6 +18,7 @@ import (
    "github.com/opencontainers/runc/libcontainer/cgroups/systemd"
    "github.com/opencontainers/runc/libcontainer/configs"
    "github.com/opencontainers/runc/libcontainer/configs/validate"
+   "github.com/opencontainers/runc/libcontainer/utils"
 )

 const (
@@ -226,10 +226,7 @@ func (l *LinuxFactory) StartInitialization() (err error) {
        // if we have an error during the initialization of the container's init then send it back to the
        // parent process in the form of an initError.
        if err != nil {
-           // ensure that any data sent from the parent is consumed so it doesn't
-           // receive ECONNRESET when the child writes to the pipe.
-           ioutil.ReadAll(pipe)
-           if err := json.NewEncoder(pipe).Encode(newSystemError(err)); err != nil {
+           if err := utils.WriteJSON(pipe, newSystemError(err)); err != nil {
                panic(err)
            }
        }
diff --git a/libcontainer/factory_linux_test.go b/libcontainer/factory_linux_test.go
index dccc8db..b0c0f49 100644
--- a/libcontainer/factory_linux_test.go
+++ b/libcontainer/factory_linux_test.go
@@ -3,7 +3,6 @@
 package libcontainer

 import (
-   "encoding/json"
    "io/ioutil"
    "os"
    "path/filepath"
@@ -12,6 +11,7 @@ import (

    "github.com/docker/docker/pkg/mount"
    "github.com/opencontainers/runc/libcontainer/configs"
+   "github.com/opencontainers/runc/libcontainer/utils"
 )

 func newTestRoot() (string, error) {
@@ -179,5 +179,5 @@ func marshal(path string, v interface{}) error {
        return err
    }
    defer f.Close()
-   return json.NewEncoder(f).Encode(v)
+   return utils.WriteJSON(f, v)
 }
diff --git a/libcontainer/process_linux.go b/libcontainer/process_linux.go
index 114c71b..ee64736 100644
--- a/libcontainer/process_linux.go
+++ b/libcontainer/process_linux.go
@@ -15,6 +15,7 @@ import (
    "github.com/opencontainers/runc/libcontainer/cgroups"
    "github.com/opencontainers/runc/libcontainer/configs"
    "github.com/opencontainers/runc/libcontainer/system"
+   "github.com/opencontainers/runc/libcontainer/utils"
 )

 type parentProcess interface {
@@ -83,7 +84,7 @@ func (p *setnsProcess) start() (err error) {
            return newSystemError(err)
        }
    }
-   if err := json.NewEncoder(p.parentPipe).Encode(p.config); err != nil {
+   if err := utils.WriteJSON(p.parentPipe, p.config); err != nil {
        return newSystemError(err)
    }
    if err := syscall.Shutdown(int(p.parentPipe.Fd()), syscall.SHUT_WR); err != nil {
@@ -270,7 +271,7 @@ func (p *initProcess) startTime() (string, error) {

 func (p *initProcess) sendConfig() error {
    // send the state to the container's init process then shutdown writes for the parent
-   if err := json.NewEncoder(p.parentPipe).Encode(p.config); err != nil {
+   if err := utils.WriteJSON(p.parentPipe, p.config); err != nil {
        return err
    }
    // shutdown writes for the parent side of the pipe
diff --git a/libcontainer/utils/utils.go b/libcontainer/utils/utils.go
index 86cf1d6..5e03f0a 100644
--- a/libcontainer/utils/utils.go
+++ b/libcontainer/utils/utils.go
@@ -3,6 +3,7 @@ package utils
 import (
    "crypto/rand"
    "encoding/hex"
+   "encoding/json"
    "io"
    "path/filepath"
    "syscall"
@@ -43,3 +44,13 @@ func ExitStatus(status syscall.WaitStatus) int {
    }
    return status.ExitStatus()
 }
+
+// WriteJSON writes the provided struct v to w using standard json marshaling
+func WriteJSON(w io.Writer, v interface{}) error {
+   data, err := json.Marshal(v)
+   if err != nil {
+       return err
+   }
+   _, err = w.Write(data)
+   return err
+}

@csfrancis
Copy link
Author

@crosbymichael tested your patch and it resolves my issue. I think your PR is a better solution that what I've proposed here.

Thanks for fixing this!

@csfrancis csfrancis closed this Jan 26, 2016
@tiborvass
Copy link
Contributor

🎉 !

@csfrancis csfrancis deleted the fix_rare_econnreset branch January 26, 2016 22:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants