Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure To Reconnect After A Socket Closed Error #3588

Closed
blyork opened this issue Nov 18, 2015 · 41 comments
Closed

Failure To Reconnect After A Socket Closed Error #3588

blyork opened this issue Nov 18, 2015 · 41 comments

Comments

@blyork
Copy link

blyork commented Nov 18, 2015

I am having a issue were the database connection does not seem to be recovering after socket close errors:

MongoError: server foobar.mongolab.com:30748 sockets closed
   at null.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\topologies\server.js:325:47)
   at g (events.js:199:16)
   at emit (events.js:110:17)
   at null.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\connection\pool.js:101:12)
   at g (events.js:199:16)
   at emit (events.js:110:17)
   at Socket.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\connection\connection.js:142:12)
   at Socket.g (events.js:199:16)
   at Socket.emit (events.js:107:17)
   at TCP.close (net.js:485:12)

My configuration options look like below:

{
  "db": {
    "safe": true
  },
  "server": {
    "auto_reconnect": true,
    "socketOptions": {
      "keepAlive": 1
    }
  },
  "replset": {
    "rs_name": "foobar",
    "socketOptions": {
      "keepAlive": 1
    }
  }
}

And the mongoose dependency tree looks like below:

├─┬ mongoose@4.2.6
│ ├── async@0.9.0
│ ├── bson@0.4.19
│ ├── hooks-fixed@1.1.0
│ ├── kareem@1.0.1
│ ├─┬ mongodb@2.0.48
│ │ ├── es6-promise@2.1.1
│ │ ├─┬ kerberos@0.0.17
│ │ │ └── nan@2.0.9
│ │ ├── mongodb-core@1.2.21
│ │ └─┬ readable-stream@1.0.31
│ │   ├── core-util-is@1.0.1
│ │   ├── inherits@2.0.1
│ │   ├── isarray@0.0.1
│ │   └── string_decoder@0.10.31
@vkarpov15
Copy link
Collaborator

Hmm how are you closing the socket? Are you closing it in mongoose or is this a hiccup in your connection to mongolab?

@blyork
Copy link
Author

blyork commented Nov 18, 2015

I am not explicitly closing the connection anywhere. The hiccup happened and our node server stopped responding to requests.

@vkarpov15
Copy link
Collaborator

How long was this hiccup? By default, MongoDB will give up trying to reconnect after 30 seconds.

@blyork
Copy link
Author

blyork commented Nov 19, 2015

I am not sure. Is there a configuration option for making it attempt reconnection indefinitely?

@vkarpov15
Copy link
Collaborator

Set the server.reconnectTries and server.reconnectInterval connection options to something really high: http://mongodb.github.io/node-mongodb-native/2.0/api/Server.html . I don't know of a mechanism for which it can attempt reconnection indefinitely, but you can set reconnectTries to something sufficiently high and it'll be close enough.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Dec 6, 2015
* Every couple of days we get disconnected and the below refs offer a possible suggestion
* Doubling the default `server.reconnectTries` and quadrupling the wait time for `reconnectInterval` *(4 seconds shouldn't be too bad instead of the default of 1)*

Refs:
* http://mongodb.github.io/node-mongodb-native/2.0/api/Server.html
* Automattic/mongoose#3588
@Martii
Copy link

Martii commented Dec 6, 2015

@vkarpov15

something sufficiently high

How high is high?

@blyork
Has your situation resolved?

@vkarpov15 and @blyork
Please forgive my inexperience with mongoose and it's tree deps and please excuse the attaching onto this issue as we seem to have the same issue... but the link at http://mongodb.github.io/node-mongodb-native/2.0/api/Server.html says autoReconnect under socketOptions...

What is the difference between the property @blyork mentioned with:

"auto_reconnect": true,

??

How long was this hiccup?

We are unable to tell in our usage but our system has been guestimated at "lagging" for 2 hours 2 days ago and 4 hours today... had to restart production (both times). If the pattern holds steady we'll know in about two days if the above commit helped... and presumably in about 4 days we'll know if the doubling helped... not sure though.

We also get this in node@4.latest:

(node) warning: possible EventEmitter memory leak detected. 11 connected listeners added. Use emitter.setMaxListeners() to increase limit.
Trace
    at MongoStore.addListener (events.js:239:17)
    at MongoStore.once (events.js:265:8)
    at MongoStore.getCollection (~/OpenUserJS.org/node_modules/connect-mongo/lib/connect-mongo.js:219:16)
    at MongoStore.get (~/OpenUserJS.org/node_modules/connect-mongo/lib/connect-mongo.js:285:10)
    at session (~/OpenUserJS.org/node_modules/express-session/index.js:408:11)
    at Layer.handle [as handle_request] (~/OpenUserJS.org/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (~/OpenUserJS.org/node_modules/express/lib/router/index.js:312:13)
    at ~/OpenUserJS.org/node_modules/express/lib/router/index.js:280:7
    at Function.process_params (~/OpenUserJS.org/node_modules/express/lib/router/index.js:330:12)
    at next (~/OpenUserJS.org/node_modules/express/lib/router/index.js:271:10)
    at initialize (~/OpenUserJS.org/node_modules/passport/lib/middleware/initialize.js:53:5)
    at Layer.handle [as handle_request] (~/OpenUserJS.org/node_modules/express/lib/router/layer.js:95:5)
    at trim_prefix (~/OpenUserJS.org/node_modules/express/lib/router/index.js:312:13)
    at ~/OpenUserJS.org/node_modules/express/lib/router/index.js:280:7
    at Function.process_params (~/OpenUserJS.org/node_modules/express/lib/router/index.js:330:12)
    at next (~/OpenUserJS.org2/node_modules/express/lib/router/index.js:271:10)

... which I don't have a clue how to resolve yet... but this usually follows after the three (3) assumed retries/fails that we normally get by default.

Refs:

@vkarpov15
Copy link
Collaborator

@Martii depends on your MongoDB version, network latency, and topology. Those two options are equivalent, just 2 names for the same thing.

Also, take a look at the stack trace, it has nothing to do with mongoose and everything with connect-mongo. FWIW, connect-mongo ranges from quirky to blatantly broken, which is why I wrote connect-mongodb-session.

@Martii
Copy link

Martii commented Dec 7, 2015

@vkarpov15

Well I'll have to ask the establishing owner that manages the MongoDB version for an exact version.

take a look at the stack trace

MongoError: server *server_clipped* sockets closed
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:101:12)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at Socket.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:142:12)
    at Socket.g (events.js:260:16)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at TCP._onclose (net.js:469:12)

... so even though the stack trace is similar to the one above this isn't mongoose ?? This is the one where it is 3 times then the final one that I mentioned here.

I have been reading these as the first line is typically (related to) the issue which is currently:

at null. (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)

@vkarpov15
Copy link
Collaborator

I was referring to the stack trace you provided in #3588 (comment) . That's a separate issue.

@Martii
Copy link

Martii commented Dec 7, 2015

That's a separate issue.

Correct. I figured you didn't need to reread the same stack trace twice and it saved some input... but guess that didn't help.

However... my apologies if I was unclear with the topic of:

Failure To Reconnect After A Socket Closed Error

and my statement of:

... but this usually follows after the three (3) assumed retries/fails that we normally get by default.


depends on your MongoDB version

It looks like we are running MongoDB v2.6.11 and from MongoLabs just like @blyork is. I'll see if I can get the latency in the VPS soon.

Thanks.

@Martii
Copy link

Martii commented Dec 7, 2015

I'll see if I can get the latency in the VPS soon.

This is abridged for security concerns:

$ ping ourMongoLabDBserverAddress
... icmp_seq=1 ttl=40 time=7.74 ms
... icmp_seq=2 ttl=40 time=7.65 ms
... icmp_seq=3 ttl=40 time=7.67 ms
... icmp_seq=4 ttl=40 time=7.89 ms

@Martii
Copy link

Martii commented Dec 7, 2015

and topology.

I don't quite understand what you are asking for by this.

@Martii
Copy link

Martii commented Dec 7, 2015

FWIW, connect-mongo ranges from quirky to blatantly broken, which is why I wrote connect-mongodb-session.

OOH! I'll take a look at that... thanks for the post edit. (Good thing I reread things a few times) :)

@vkarpov15
Copy link
Collaborator

Re: #3588 (comment), are you running a standalone mongod, a replica set, or a sharded cluster?

@Martii
Copy link

Martii commented Dec 8, 2015

standalone mongod, a replica set, or a sharded cluster?

All three I think... please remember I'm not the primary maintainer of that portion and I'm still a bit "green" with MongoDB's... but the production connection string has a replicaSet in it and the establishing owner said it was sharded to me in private. (so now add some confusion for me based off your question ;) Our development appears to be standalone so we would need a balance with a trend towards sharded/replica. Is this what you mean by topology?

@Martii
Copy link

Martii commented Dec 8, 2015

btw development MongoDB is v3.0.7 ... for some reason unknown, but acknowledged, our dev is running a newer version of MongoDB... I'll have to figure out why when I understand this a little more with our VPS and mongolabs.

@chrisckchang
Copy link
Contributor

Hi folks,

I've been working with a few different users that are seeing "sockets closed" errors. One observation I've made is that a single-node connection appears to be more robust than a replica set connection (unclear why). When users got a "sockets closed" error from a replica set connection and changed it to a single-node connection, they no longer got the error entirely.

To make a replica set connection, you'd use a MongoDB URI that follows this format:

mongodb://<dbuser>:<dbpassword>@ds012345-a0.mongolab.com:56789,ds012345-a1.mongolab.com:56790/<dbname>?replicaSet=rs-ds012345

To make a single-node connection (in this example a0 is the PRIMARY server), you can use this format:

mongodb://<dbuser>:<dbpassword>@ds012345-a0.mongolab.com:56789/<dbname>

@Martii and @blyork can you please try to change your replica set connection to a single-node connection and see if it results in fewer or no errors? This is a temporary solution because your application can no longer handle failovers, but may provide a clue towards a fix.

@Martii
Copy link

Martii commented Dec 8, 2015

can you please try to change your replica set connection to a single-node connection and see if it results in fewer or no errors?

Well I can go back in the error logs since we've been on the VPS and compare it to when our establishing owner changed it to "sharded/replica"... but I literally am "green" on doing this nor do I think I have that access... I can ask him though if he wants to do that on production for a while... but I have to wait and see if the bump test is helping at all... which means I shouldn't take pro down for anything, including recent commits, until mid-Thursday EST.

I can say that the times that I've run our development for long periods of time, which is single, I've never seen the sockets closed message. We have subtle differences between dev/testing and production and dev can be different platforms too depending on contributor.

@Martii
Copy link

Martii commented Dec 8, 2015

If I recall correctly though he said he's seen this error before and when we were on nodejitsu we had the issue and he thought he fixed it... but it would appear to be back... I wasn't the active maintainer early on when this was done initially with a single and that host provider wasn't very revealing on errors but I was notified when when we went "sharded/replica" on the VPS... so I should have those logs.

@Martii
Copy link

Martii commented Dec 8, 2015

Nothing with grep 'close' in error logs from May 2015 to August 2015 before the "sharding/replica" change... but the private response from the establishing owner stated he's seen it before which seems to me that it may have been present with single... IDK for sure... I think that's too far back to ask him to recall from memory though.

@blyork
Copy link
Author

blyork commented Dec 10, 2015

@Martii

I changed my configuration to something like this:

{
    "connect": "mongodb://user:password@foobar1.mongolab.com:30736,foobar2.mongolab.com:30737/db?replicaSet=foobar",
    "options": {
        "db": {
            "safe": true
        },
        "server": {
            "auto_reconnect": true,
            "poolSize": 10,
            "reconnectTries": 86400,
            "reconnectInterval": 1000,
            "socketOptions": {
                "noDelay": true,
                "connectTimeoutMS": 30000,
                "keepAlive": 1,
                "socketTimeoutMS": 0
            }
        },
        "replset": {
            "poolSize": 10,
            "socketOptions": {
                "noDelay": true,
                "connectTimeoutMS": 30000,
                "keepAlive": 1,
                "socketTimeoutMS": 0
            }
        }
    }
}

It worked for a little while but now I am getting this:

MongoError: server foobar1.mongolab.com:30736 received an error {"name":"MongoError","message":"read ECONNRESET"}
    at null.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\topologies\server.js:259:47)
    at g (events.js:199:16)
    at emit (events.js:110:17)
    at null.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\connection\pool.js:77:12)
    at g (events.js:199:16)
    at emit (events.js:110:17)
    at Socket.<anonymous> (C:\sensorfact\node_modules\@sensorfact\sensorfact-pclx-api\node_modules\mongoose\node_modules\mongodb\node_modules\mongodb-core\lib\connection\connection.js:121:49)
    at Socket.g (events.js:199:16)
    at Socket.emit (events.js:107:17)
    at net.js:459:14
    at process._tickCallback (node.js:355:11)

We are using MongoLab with mongodb version 3.0.6 in the AZURE EASTUS cloud.

@chrisckchang

I am a uncomfortable switching to the single-instance mode as it is our production environment. That said, we have another environment with a single-instance outside of MongoLab. We have yet to receive these errors there. Of course, we don't have the same load there.

@Martii
Copy link

Martii commented Dec 10, 2015

@blyork

Thanks so much... a few more hours to go and I'll know if the test bump was okay... although the:

It worked for a little while ...

... and ...

Of course, we don't have the same load there.

... is going to be a long term observation for us too.

I'm not totallly thrilled about a 4 second delay for ours but I am guessing that MongoLabs and/or our VPS is having connection issues (hopefully not the latter)... the odd thing is that we get it 3 times in a row which implies that it's trying to autoReconnect and we don't have that set. :\

but now I am getting this:

Maybe this answer could help if you haven't already seen it? We got these on the previous host provider and was one of the deciding factors of switching providers... hopefully they don't recur. I presume we have the error trap here that SO is talking about.

@vkarpov15

Those two options are equivalent, just 2 names for the same thing.

Took me three days to put two and two together on what you meant by this... oy ;)... would be nice if the MongoDB API had easily found documentation on auto_reconnect along with autoReconnect. Most of the Stack Overflow questions show the former and your reference shows the latter. Anyhow... thank you for clarifying that. :)

@Martii
Copy link

Martii commented Dec 10, 2015

@blyork
One more thing from my TCP/IP experience the socketTimeoutMS default of zero might be a bit short e.g. most likely a clock cycle is used (hopefully)... but if it "disables" the check with zero that might be an issue.

@Martii
Copy link

Martii commented Dec 14, 2015

Followup stderr/stdout logs for this issue:

2015-12-11 16:31:23.531 +00:00:
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.541 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.543 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.545 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.546 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.547 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.549 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.550 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.552 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.566 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.618 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

2015-12-11 16:31:23.632 +00:00: 
{ [MongoError: *server_clipped* sockets closed]
  name: 'MongoError',
  message: '*server_clipped* sockets closed' }

... had a little shakiness here and it did seem to recover with the new values (even though ~1 to 2 seconds isn't what I put in... 4 seconds is quadrupled and 1 to 2 seconds between the log stamping isn't) since there was no related server restarts with our process manager and node. No other instances in the last ~3 days.

This is still implying that we are autoReconnecting here which is even more baffling since we definitely haven't set that option.

Later tomorrow I'll update mongoose to the latest and restart our server and continue long term observation... apologies if this observation is a bit noisy but I did wait 3 days... perhaps it will assist in debugging somewhere in the MongoDB tree deps including mongoose.

@mkw
Copy link

mkw commented Dec 14, 2015

I'm getting the stack trace that started this issue, too. It started when I upgraded from 4.0.6 to 4.3.0, and 4.3.1 did not resolve it.

I have a batch process that causes it very quickly and very consistently after the upgrade (the same code worked on 3.x fine, too). It invokes many, many "ensureIndex()" operations on startup, then 9 query operations simultaneously to cache some data before doing the bulk of the processing. It always fails on the 8th query for some reason, which is a simple find() that returns select fields from every object in a collection (3 of the others are relatively simple aggregations). Our pool size was 5, and I tried increasing that, but it did not change anything. I'm adding the auto_reconnect, reconnectTimeout and reconnectInterval options to see if that helps.

Locally, we use a single MongoDB instance on OS X, but we use clusters in our test environments and production (hosted by MongoLab), and it only happens when clustered. I've not tried recreating locally with a cluster yet, but I will if it would be helpful. I can't really release the test case that I have, but if there's some logging that I could add or a custom build of Mongoose that might be helpful, I'd be happy to try either.

@mkw
Copy link

mkw commented Dec 14, 2015

I just created a single-node "replica set" locally (DB annoyingly big to make a "real" one locally), and the problem did not happen. It's only happening on MongoLab, but it happens 100% of the time connecting to the DB there (whether I connect from my OS X box or use a Heroku process), even after adding the auto_reconnect, reconnectTimeout, and reconnectInterval options. I suspect that those options would help Mongoose recover the app, though let the transaction in progress fail.

@chrisckchang
Copy link
Contributor

@mkw from my testing, i noticed if you make a single-node connection (#3588 (comment)) to a MongoLab deployment the issue should go away. can you see if this is true for you?

@mkw
Copy link

mkw commented Dec 15, 2015

Same version, but no auth locally; I can add auth and re-test.

But, only the mongoose version changed, not anything about the db itself.

Sent from my iPhone

On Dec 14, 2015, at 5:09 PM, Christopher Chang notifications@github.com wrote:

@mkw is your local replica set also running with authentication and the same mongodb version as your MongoLab deployment?


Reply to this email directly or view it on GitHub.

@mkw
Copy link

mkw commented Dec 15, 2015

I can test that, but it won't work for production. We need the reliability of a replica set connection.

I turned on debug logging in Mongoose, and unfortunately, the problem mysteriously goes away when I do that, as well. So, it's almost certainly a timing issue. As for what kind of timing issue, I haven't the first clue.

@mkw
Copy link

mkw commented Dec 15, 2015

As expected, switching to a single connection works fine.

@mkw
Copy link

mkw commented Dec 15, 2015

After a bunch of debugging and many, many runs, I discovered that my repeatable failure occassionaly works. This is good because I managed to add a bunch of debug messages via breakpoints and am now I'm attaching a zip file two logs named success.log and failure.log. Both have breakpoint-generated log messages that I hope are useful in figuring out the source of this error:

  • The "Connection" prefix log messages are from event handlers in mongo-core/lib/connection/connection.js
  • The artificial stack trace error is generated by a breakpoint log message from Connection.prototype.destroy() in mongo-core/lib/connection/connection.js.
  • The "Pool" prefix log messages are from event handlers in mongo-core/lib/connection/pool.js.
  • The "NativeConnection" prefix log messages are from event handlers in mongoose/lib/drivers/node-mongodb-native/connection.js

Additionally, I used the "longjohn" module to get stack traces including the source of the callback for the final failure.

What is particularly strange to me is that, if I'm reading it right, when a single connection gets a socket close, all of the connections in the pool appear to have destroy() via pool.js#closeHandler() calling self.destroy(). Shouldn't a connection pool just be destroying the connection with the error and replacing it? I might be misunderstanding this, but it certainly seems suspicious at first glance.

This is a major issue for us right now. I'm going to test various older versions of Mongoose to see if the error is still there. But right now, I'm completely at a loss as to how to work around it without giving up the ability to failover between our primary and secondary replicas.

logs.zip

@vkarpov15
Copy link
Collaborator

@mkw et. all, would recommend taking a look at my colleague Christian's comments here: https://github.com/christkv/mongodb-core/issues/66#issuecomment-165052045 . Nobody knows the internals of the mongodb driver better than Christian :)

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Dec 16, 2015
* Set some DB options to their assumed defaults from http://mongodb.github.io/node-mongodb-native/2.0/api/Server.html ... relates to https://github.com/christkv/mongodb-core/issues/66#issuecomment-165052045
* Set some replset defaults to their assumed defaults from http://mongodb.github.io/node-mongodb-native/2.0/api/ReplSet.html ... relates to Automattic/mongoose#3588 (comment)
* Move/change a stderr message to debug mode of *node*... clear up the logs a bit since script source may not be found just yet. This only happens on pro so far and probably deals with streams not ready yet. Applies to OpenUserJS#430
* Move S3 source not found stderr to debug mode of *node*... known issue of OpenUserJS#486. Applies to OpenUserJS#430

**NOTES**
* This will be a test whereas unless it's a huge bug I can't take pro down for at least 3 days, give or take to see if this helps or hinders from OpenUserJS#845 backout.
@Martii
Copy link

Martii commented Dec 16, 2015

@vkarpov15
I am a little confused on his statement on the event listeners... db.serverConfig is undefined on our end... is this buried in mongoose ?

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Dec 16, 2015
* Gracefully disconnect from MongoDB during restarts
* Add some console messages in for tracking
* Remove the `bind` on error trap

Applies to OpenUserJS#845, OpenUserJS#851, Automattic/mongoose#3588 and loosely christkv/mongodb-core#66
@mkw
Copy link

mkw commented Dec 16, 2015

@vkarpov15 Thanks -- his comments were helpful in figuring out what to do next to debug. If I manage to learn anything new, I will post in that issue.

@chrisckchang
Copy link
Contributor

We have reliably reproduced the issue and are in the process of opening a ticket with MongoDB to better understand the behavior behind this issue.

From our experimenting, we have had success eliminating these errors by increasing the connectTimeoutMS setting to a very large value such as 60*1000. We recommend the following configuration as a temporary workaround while we try to understand more about the driver behavior:

MongoClient.connect("mongodb://<dbuser>:<dbpassword>@host1:port1,host2:port2/<dbname>?replicaSet=replicaSetName", {
  replset: {
    socketOptions: {
      connectTimeoutMS: 60*1000
    }
  }
}, function(err, db) {
});

Note that if you are connecting to a replica set, the "server" level socketOptions will not be used. Use "replset" instead. Also, I suspect that if you are running very long-running operations against your database deployment, you may need to increase your connectTimeoutMS option to an even larger value.

@vkarpov15
Copy link
Collaborator

Thanks for clarifying @chrisckchang , that's very helpful.

@Martii
Copy link

Martii commented Dec 18, 2015

Also, I suspect that if you are running very long-running operations against your database deployment, you may need to increase your connectTimeoutMS option to an even larger value.

If this ends up being the case most likely we'll have a timed closure of the DB and then reopen it... perhaps daily. Trying to guess on when a persistent connection fails isn't exactly easy depending on how active a site is... we'll just have to have some downtime until it's reconnected programmatically.

@Martii
Copy link

Martii commented Dec 18, 2015

Now this is interesting:

2015-12-17 17:59:11.895 +00:00:
{ [MongoError: server *server_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_clipped* sockets closed' }

2015-12-17 17:59:11.944 +00:00:
{ [MongoError: server *server_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_clipped* sockets closed' }

2015-12-17 17:59:12.686 +00:00:
MongoDB connection is reconnected

... only one of the event traps I put in triggered and that was on reconnect. I would think the disconnected and error would at least work... disconnected does with a SIGINT (Ctrl + c). open and connected work on application start too.

This is all with the OS levels used with the defaults from the Docs for MongoDB referenced above.


Did not trip a restart with our process manager.

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Dec 18, 2015
* Change logic to only use `replset` on production ... as per Automattic/mongoose#3588 (comment) from @chrisckchang
* Default to single connection on development with `server` ... as per Automattic/mongoose#3588 (comment) from @chrisckchang
* Change `connectTimeoutMS` to 60 seconds instead of presumed inherited OS value from https://github.com/christkv/mongodb-core/issues/66#issuecomment-165052045 by @christkv and overridden with @chrisckchang recommendation previously mentioned in commit summary.
* Added NOTE on non-standard keepAlive value that was in before I started twiddling with this... this was on dev and pro pre replica/sharding set.

Applies to OpenUserJS#845, OpenUserJS#851, OpenUserJS#852, Automattic/mongoose#3588 and loosely christkv/mongodb-core#66

Refs:
* Automattic/mongoose#3588 (comment)
@Martii
Copy link

Martii commented Dec 19, 2015

Still have them with ccef32/app.js#L49-L74 with ccef32/package.json#L31. No server trip for restarts, which is very good, but I'm wondering about our data integrity of the DB for those transactions. :\

stderr log (snipped):

2015-12-19 04:33:09.257 +00:00: Starting application...
2015-12-19 04:33:13.184 +00:00: (node) Buffer.write(string, encoding, offset, length) is deprecated. Use write(string[, offset[, length]][, encoding]) instead.
2015-12-19 15:25:50.282 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.325 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.385 +00:00: MongoError: server *server_at_mongolab_clipped* sockets closed
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:101:12)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at Socket.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:142:12)
    at Socket.g (events.js:260:16)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at TCP._onclose (net.js:469:12)
2015-12-19 15:25:50.390 +00:00: MongoError: server *server_at_mongolab_clipped* sockets closed
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:101:12)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at Socket.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:142:12)
    at Socket.g (events.js:260:16)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at TCP._onclose (net.js:469:12)
2015-12-19 15:25:50.390 +00:00: MongoError: server *server_at_mongolab_clipped* sockets closed
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:101:12)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at Socket.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:142:12)
    at Socket.g (events.js:260:16)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at TCP._onclose (net.js:469:12)
2015-12-19 15:25:50.420 +00:00: MongoError: server *server_at_mongolab_clipped* sockets closed
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/server.js:331:47)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at null.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/pool.js:101:12)
    at g (events.js:260:16)
    at emitTwo (events.js:87:13)
    at emit (events.js:172:7)
    at Socket.<anonymous> (~/OpenUserJS.org/node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/connection/connection.js:142:12)
    at Socket.g (events.js:260:16)
    at emitOne (events.js:77:13)
    at Socket.emit (events.js:169:7)
    at TCP._onclose (net.js:469:12)
2015-12-19 15:25:50.939 +00:00: MongoDB connection is reconnected

... and stdout ...

2015-12-19 15:25:50.233 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.243 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.251 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.252 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.257 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.260 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.262 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.263 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.264 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.265 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.272 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.274 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.275 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.307 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.324 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.347 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }
2015-12-19 15:25:50.349 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }

... approximate 11 hours into the connection this round and with the recommended value from above. :( Kind of unusual that they are showing up in stdout and stderr.


Have a few distro updates today and a newer kernel... starting retest period again with those updates. (No closures in three days since)

Martii pushed a commit to Martii/OpenUserJS.org that referenced this issue Dec 24, 2015
* Sub dep *mongodb-core* updated in all related MongoDB deps Ref: https://github.com/christkv/mongodb-core/issues/66#issuecomment-167133267
* Delete ops retested

**NOTE**:
* We haven't gotten these except for [that one one initial time](Automattic/mongoose#3588 (comment)) but I've also restarted the server a few times after the initial ~3 day test period and could still be lurking.
@mkw
Copy link

mkw commented Dec 28, 2015

Updating to Mongoose 4.3.4 fixed this for me. It upgraded Mongo to 2.1.2, which was referenced in christkv/mongodb-core#66 (opened when trying to figure this one out).

@Martii
Copy link

Martii commented Jan 13, 2016

stderr:

2016-01-13 11:06:36.336 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }

... (24 more of these)...

2016-01-13 11:06:36.421 +00:00: { [MongoError: server *server_at_mongolab_clipped* sockets closed]
  name: 'MongoError',
  message: 'server *server_at_mongolab_clipped* sockets closed' }

stdout:

2016-01-13 07:14:04.840 +00:00: Starting application...
2016-01-13 07:14:09.370 +00:00: (node) Buffer.write(string, encoding, offset, length) is deprecated. Use write(string[, offset[, length]][, encoding]) instead.
2016-01-13 11:06:46.501 +00:00: MongoDB connection is reconnected

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants