-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replication Stops After 0 byte reads #155
Comments
The 0-byte read just seems to indicate EOF; I see these in normal circumstances as well. The stream is supposed to send a separate callback for that, but I think because this is going through the CFHTTPStream layer it might behave differently. There are known issues where network intermediaries will kill the connection — the -handleInvalidResponse method attempts to work around this by detecting situations we've run into before. But in your situation it sounds like the stream is closed before any response body is received at all. How is the device reaching the server — through what sort of network, proxies, etc.? It looks as though some intermediary is aggressively killing idle socket connections. |
FYI, I took out the "read 0 bytes" warning a few weeks ago because it's harmless (commit 9f24e35.) |
Thanks, I've changed the title of the issue to reflect that the warning is no longer there. We're using nginx for SSL and to proxy sync_gateway as described briefly above. In our case the consequences are far from harmless as replication stops until the user kills the app and restarts it or backgrounds the app and resumes it. Here's a trace created less than a week ago using commit d577c26.
|
Is the device connecting over WiFi or cell? If the latter, what carrier? Something is killing the socket after a minute of idle time. Is there some setting in nginx that could be doing that? |
The device is connected over WiFi. The same problem occurs in the simulator as well as on a device. We do configure ssl and proxies but little else. Our original nginx config can be found here. I did find that I can control the behavior in nginx by adding
This will work but I question the behavior where CBL silently stops the replication when it encounters this condition. There's no doubt that others will encounter the same problem. It seems to me that CBL should either (a) restart the connection or (b) provide a notification that an error has occurred that an app can monitor so that the app can restart the connection. |
If you change that setting, does it make this problem go away? I can work on making the replicator recover better, but I'd still like to know where this socket disconnect came from... |
Actually, something else must be going wrong if the client doesn't receive any response data before the socket is closed. Both CouchDB and the Sync Gateway will immediately send back the opening of the JSON response — tl;dr: From your logs it looks like the client reads the initial changes feed fine, but then when it makes another request to wait for future changes, the socket never connects properly, never receives the expected initial data, and gets killed after a minute. Does the Sync Gateway log the second |
Yes. Here's a run I made just now with the default nginx timeout in place. The run was made using the stable branch with the patch above in place.
|
OK, I finally think I have an idea what's going on here. Hypothesis: your nginx proxy is handling an HTTP chunked-mode response by buffering up the data until it's complete, and only then sending it on. This approach doesn't work well with an open-ended response like a longpoll — the server won't pass on the beginning of the response until an indefinite amount of time later when the server sends the changes. (I've run into this behavior twice now in different HTTP parsers, one of them being Apple's NSURLConnection.) Experiment: Try bypassing nginx and making the iOS device connect directly to the Sync Gateway. The problem should go away. Conclusions (assuming the hypothesis is validated):
|
Heartbeat support is couchbase/sync_gateway#164. |
I think your hypothesis is correct. Here are the results going directly against sync_gateway:
|
I also added heartbeat support to the gateway. If you can update your gateway, please try with nginx and the connections shouldn't be dropped anymore, because the server will be sending a CRLF every 30 seconds. |
Just got back to this, sorry for the delay. For reference, I'm using CBL 4f82865 and sync_gateway 27d9611146251fa58b32839b552b26be8874a43a. The sync_gateway fix (f9d4aa5ba40d143815f07008580b5145e3dcde21) works great as long as the |
In our app, if we start the app and there's no data to replicate to or from the device and the app remains in the foreground for more than a minute, ChangeTracker will warn about 0 byte reads. After that, no data is replicated until the app is suspended and resumed.
We've seen this problem in the stable and master branch (commit d577c26). The only change we made to CBL was to change #if 0 to #if 1 on line 73 of CBLAuthorizer.m. In our configuration, nginx is used to handle ssl and to proxy sync_gateway.
Here's what we're seeing with logging turned on for ChangeTracker using the stable branch:
If we suspend and resume the app, the replications restart. If data is replicated to/from the device during the first minute, the problem doesn't occur.
I made a quick hack to CBLChangeTracker to restart the replication when this condition occurs.
The text was updated successfully, but these errors were encountered: