-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
net/http: readLoopPeekFailLocked: <nil>
from client when server sends response immediately
#31259
Comments
readLoopPeekFailLocked: <nil>
from client when server sends response immediately
CC @bradfitz |
Go version
BehaviorRunning the following code with package race
import (
"bytes"
"context"
"net"
h "net/http"
ht "net/http/httptest"
"testing"
"time"
)
func TestDataRace(t *testing.T) {
d := &dialer{content: "HTTP/1.1 502 Bad Gateway\r\n\r\n"}
tr := &h.Transport{
DialContext: d.dial,
}
req := ht.NewRequest(h.MethodGet, "http://example.com", nil)
resp, e := tr.RoundTrip(req)
if e != nil {
t.Error(e)
} else if resp.StatusCode != h.StatusBadGateway {
t.Error("status:", resp.Status)
}
}
type dialer struct {
content string
}
func (d *dialer) dial(ctx context.Context, network,
addr string) (n net.Conn, e error) {
n = newMockConn(d.content)
return
}
type mockConn struct {
read *bytes.Buffer
write *bytes.Buffer
}
func newMockConn(content string) (m *mockConn) {
m = &mockConn{
read: bytes.NewBufferString(content),
write: new(bytes.Buffer),
}
return
}
func (m *mockConn) Read(p []byte) (n int, e error) {
n, e = m.read.Read(p)
return
}
func (m *mockConn) Write(p []byte) (n int, e error) {
n, e = m.write.Write(p)
return
}
func (m *mockConn) Close() (e error) { return }
func (m *mockConn) LocalAddr() (a net.Addr) { return }
func (m *mockConn) RemoteAddr() (a net.Addr) { return }
func (m *mockConn) SetDeadline(t time.Time) (e error) { return }
func (m *mockConn) SetReadDeadline(t time.Time) (e error) {
return
}
func (m *mockConn) SetWriteDeadline(t time.Time) (e error) {
return
} Output when failing
|
Defining in the previous code type mockConn struct {
read *bytes.Buffer
write *bytes.Buffer
writeOk chan bool
}
func newMockConn(content string) (m *mockConn) {
m = &mockConn{
read: bytes.NewBufferString(content),
write: new(bytes.Buffer),
writeOk: make(chan bool, 0),
}
return
}
func (m *mockConn) Read(p []byte) (n int, e error) {
<-m.writeOk
n, e = m.read.Read(p)
return
}
func (m *mockConn) Write(p []byte) (n int, e error) {
n, e = m.write.Write(p)
m.writeOk <- true
return
} |
Maybe related #34066 |
I have exactly the same with golang 1.13 and a fake http server like that: |
@logrusorgru seeing this happen as well. Can you reproduce it on go 1.11 as well or just 1.12++ ? |
@cyx, note that per the release policy, neither Go 1.11 nor 1.12 is supported at this point. (@sebglon's report of a repro using Go 1.13 is helpful, though.) |
@cyx, I just have checked out I don't think this issue is a real problem, but anyone is working with HTTP "manually" may run into it. I just use (let's say it's app.go) var isTesting bool
// [...]
// handler function
if isTesting {
time.Sleep(100*time.Microsecond)
} (let's say it's app_test.go) var _ = func() struct{} {
isTesting = true
return struct{}{}
}() To get my tests 100% stable. Because of the 100µs, I think it's about goroutines scheduling. |
Thanks! That makes sense. I'm seeing this in production with the clients
primarily being node http clients -- although the interesting thing is that
I only see it on higher throughput environments
|
I can report that we're seeing this issue in production at Let's Encrypt, using https://github.com/letsencrypt/boulder/. Specifically, there are some "hasty" HTTP servers out there that will send an HTTP response as soon as the TCP connection completes. When we make multiple concurrent requests to such a server, we get The reason we'd like to see this fixed: In general we try to route all log output through our logging system, and this is the one place in
The output is like below (the number of errors varies from run to run, but there's almost always at least one error):
Expected behavior: No errors output. Additional detail: This test case also reproduces when I set I've tried to figure out whether hasty responses are allowed or disallowed by RFC 7230. The closest language I can find is:
Which seems to implicitly allow them. At any rate it's not clear that it would be possible to consistently reject hasty responses, since there's an unavoidable network race between writing a request and reading a response. Given that, I think it probably makes sense to make the current behavior (accepting hasty responses) more consistent. |
I can confirm that this patch, which starts persistConns with numExpectedResponses = 1, causes the repro code above to run without errors. 0001-Start-persistConns-with-numExpectedResponses-1.patch.txt |
move --- net/http/transport.go.bk 2020-04-24 10:06:11.253338231 +0800
+++ net/http/transport.go 2020-04-24 10:07:03.676671394 +0800
@@ -1893,6 +1893,7 @@
alive := true
for alive {
+ rc := <-pc.reqch
pc.readLimit = pc.maxHeaderResponseSize()
_, err := pc.br.Peek(1)
@@ -1904,7 +1905,6 @@
}
pc.mu.Unlock()
- rc := <-pc.reqch
trace := httptrace.ContextClientTrace(rc.req.Context())
var resp *Response
|
hi i'm receiving the same error. is there a way I can try this patch out without having to rebuild a fork fo go? |
Yep, it's still actual with |
@jsha Thanks for your repro code! As far as I understand, I think the race you mentioned above may also happen for "non-hasty" HTTP servers, in which case However, I can not reproduce the same error by replacing package main
import (
"fmt"
"log"
"net"
"net/http"
"sync"
"time"
)
const port = ":8081"
var wg sync.WaitGroup
func init() {
tr := http.DefaultTransport.(*http.Transport)
tr.DisableKeepAlives = true
}
func main() {
//go hastyHTTPServer()
go normalHTTPServer()
time.Sleep(100 * time.Millisecond)
for i := 0; i < 2; i++ {
wg.Add(1)
go fetchLoop()
}
wg.Wait()
}
func fetchLoop() {
for i := 0; i < 1e3; i++ {
resp, err := http.Get("http://localhost" + port)
if err != nil {
log.Println(err)
continue
}
resp.Body.Close()
}
wg.Done()
}
// hastyHTTPServer runs an HTTP server that sends a response and closes the
// connection as soon as a client makes a TCP connection, rather than waiting
// for the client to actually make an HTTP request.
// A real-world example of this type of server can be seen at 8849777.com
func hastyHTTPServer() {
ln, err := net.Listen("tcp", port)
if err != nil {
log.Fatal(err)
}
for {
conn, err := ln.Accept()
if err != nil {
log.Fatal(err)
}
go func() {
conn.Write([]byte("HTTP/1.0 200 OK\r\n\r\nQuick response!"))
conn.Close()
}()
}
}
func normalHTTPServer() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "HTTP/1.0 200 OK\r\n\r\nQuick response!")
})
log.Fatal(http.ListenAndServe(port, nil))
} UPDATE: Sorry for my misunderstanding! In fact, the following line will block until the HTTP response is available (or until the Client.Timeout is exceeded): Line 2087 in f5bdbf3
Therefore, in a typical HTTP interaction (i.e. the response follows the request), |
For those who are interested, I have finally figured out the reason for the TL;DR: An abnormal 204 response, which includes a body, will cause the BackgroundWe have deployed two types of backends behind Caddy (a web server in Go):
But only HTTP requests in the 2nd case (i.e. Caddy <-> Erlang Backend) are encountering the Root CauseThe Erlang Backend is using Cowboy 2.7.0, which unfortunately allows developers to construct a 204 response with a body. And the fact is that our Erlang developer does do this! After correcting these abnormal 204 responses, all NOTE: Cowboy 2.8.0 has fixed this issue (commit). Reproducing Codepackage main
import (
"log"
"net/http"
"time"
)
const port = ":8081"
func main() {
go abnormalHTTPServer()
time.Sleep(100 * time.Millisecond)
fetchLoop()
}
func fetchLoop() {
client := &http.Client{Timeout: 2 * time.Second}
for i := 0; i < 2; i++ {
resp, err := client.Get("http://localhost" + port)
if err != nil {
log.Println(err)
continue
}
resp.Body.Close()
}
}
// abnormalHTTPServer runs an HTTP server that sends a 204 response with a body.
func abnormalHTTPServer() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
hj, ok := w.(http.Hijacker)
if !ok {
panic("impossible for HTTP/1.x")
}
conn, _, err := hj.Hijack()
if err != nil {
panic(err)
}
// Mimic the abnormal 204 response from Cowboy 2.7.0.
conn.Write([]byte("HTTP/1.1 204 No Content\r\n\r\n{}"))
//conn.Close()
})
log.Fatal(http.ListenAndServe(port, nil))
} |
The same error still can be reproduced with #31259 (comment) on go1.19.5 linux/amd64 |
The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens because the transport peeks the first few bytes on the connection and gets some data even though it doesn't expect any. This causes it to go into an error state even though there is no error resulting in the formatting directive to break. This commit removes the delay and adds a note why we can't do this for now. This will reduce the amount of requests we can retry because the client will send data before we know that the connection is good. After we sent _some_ data we can't be sure that the server hasn't started processing, hence no retry in such cases. See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179 See: golang/go#31259 Resolves: cloudfoundry/routing-release#316
The delay causes a race condition in the go transport that results in a 502 Bad Gateway with: `endpoint_failure (readLoopPeekFailLocked: %!w(<nil>))`. This happens because the transport peeks the first few bytes on the connection and gets some data even though it doesn't expect any. This causes it to go into an error state even though there is no error resulting in the formatting directive to break. This commit removes the delay and adds a note why we can't do this for now. This will reduce the amount of requests we can retry because the client will send data before we know that the connection is good. After we sent _some_ data we can't be sure that the server hasn't started processing, hence no retry in such cases. See: https://cloudfoundry.slack.com/archives/C033ALST37V/p1680888356483179 See: golang/go#31259 Resolves: cloudfoundry/routing-release#316
Related to #59310 error still exists in Go 1.20 |
FYI, a new RFC 9112 was published in June 2022 adding more clarity to unsolicited responses. This spec supersedes RFC 7230 in that regard:
|
That's very interesting @domdom82 ! Thanks for sharing. Strangely, even though RFC 9112 seems to make this more clear, I think it winds up being less clear - and to some extent, introduces a race condition into protocol handling. Traditional single-threaded HTTP implementations would do something like:
If you do that, a response from a "hasty" HTTP server will look normal - it will look as if it was a response to the request. However, my understanding is that Go winds up in a different situation:
So depending on the speed of the server and the speed of the network, a "hasty" response could arrive before or after the request is written. So the RFC 9112 requirement to reject responses on idle channels seems to require the current racy behavior - but only if the client happens to be reading concurrently. Perhaps a more useful fix on the HTTP RFC side would have been to specify that servers MUST NOT be hasty. That would make it somewhat easier to say that the client is free to do whatever it likes (including logging an error to stdout) when a hasty server is encountered. |
Using this as a https "server" (trying to repro/fix another issue): echo -n -e "HTTP/1.1 200 Ok\r\nContent-Length:0\r\n\r\n" | openssl s_server -accept 8777 -cert server.crt -key server.key I get err like (that %!w if anything is a bug, right?) |
What version of Go are you using (
go version
)?Does this issue reproduce with the latest release?
Yep. 1.14.1
What operating system and processor architecture are you using (
go env
)?go env
OutputWhat did you do?
100% reproducible on Linux x64, and play.golang.org. The behavior is unpredictable and the code (see below) to reproduce should be executed few times to got the error and the log.
The error is:
The log is:
What happens?
HTTP server sends response without reading request and without a delay. This approach is good for proxy servers where a client determined by his IP (i.e. using white-list). Thus, we haven't to read request, if the user is not white-listed.
Reproducing
(try few times to got error)
play.golang.org
Also, playing with the N constant I've got this error for N=10, N=100 and N=1000. For N=1000 this error occurs every time. For N=10 rarely. Seems, it depends on goroutines scheduling.
What did you expect to see?
Got this error every time or never.
What did you see instead?
It's unpredictable. That makes testing a bit harder.
Also
Probably related issues #15253, #15446
The text was updated successfully, but these errors were encountered: