Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default uWebSocket Engine in 2.0.1 causing 100% CPU spin on server #2956

Closed
1 of 2 tasks
robtweed opened this issue May 30, 2017 · 25 comments
Closed
1 of 2 tasks

Default uWebSocket Engine in 2.0.1 causing 100% CPU spin on server #2956

robtweed opened this issue May 30, 2017 · 25 comments
Milestone

Comments

@robtweed
Copy link

I want to:

  • report a bug
  • request a feature

Current behaviour

Version 2.0.1 exhibits a problem for me when using the default engine: uWebSockets, pushing the Node.js server process into 100% CPU

Steps to reproduce (if the current behaviour is a bug)

I get the problem consistently if I have an application running in the browser with a web-socket connection in place. I then reload the application in the browser (which forces the old socket to be removed and a new one put in place). At this point the server goes into a spinning loop, eating 100% of the CPU.

Replacing the engine with "ws" fixes the issue.

Setup

  • OS: Ubuntu 16.04
  • browser: Chrome 57
  • socket.io version: 2.0.1

Other information (e.g. stacktraces, related issues, suggestions how to fix)

Here's what node debug reports when I pause during 100% CPU utilisation:
ubuntu@ip-172-30-1-57:~/qewd$ node debug localhost:7000
connecting to localhost:7000 ... ok
debug> pause
break in node_modules/uws/uws.js:8
6 const DEFAULT_PAYLOAD_LIMIT = 16777216;
7

8 function noop() {}
9
10 function abortConnection(socket, code, name) {
debug>

It appears to be stuck in a loop calling this function

@robtweed
Copy link
Author

Just a follow-up - I've tried to get some further diagnostic info on what's actually happening, but no joy. All I can tell you is that it does seem to be related to a socket disconnection event and when using the uWebSockets engine. The noop() function in uws.js starts being fired repeatedly - I've been unable to find out what's calling it: noop.caller just returns null, and there's no stack trace if you force an error within noop(). I don't believe I'm doing anything unusual within my app the way I use web-sockets, and switching to "ws" as the engine returns it to normality. If you can give me any suggestions to try out to help diagnose the problem, do let me know, and I'll see what I can find out.

@darrachequesne
Copy link
Member

@robtweed I am not able to reproduce (same OS, same browser, same socket.io version). Can you reproduce consistently? Could you provide some code please?

@robtweed
Copy link
Author

robtweed commented Jun 1, 2017

OK I eventually tracked it down. It happens if you've forked a child process, and the process environment of the master process is inherited by the child. If socket.io is loaded in the master process environment at the time the child process is forked, then reloading a socket client throws the master process into the noop() spin.

Initially everything is fine - as soon as you reload the browser, causing the original socket to be closed and a new one created, the master process goes berserk - for me it happens 100% of the time.

For example, I've managed to strip it back to this, so hopefully it's replicable:

var express = require('express');
var app = express();
server = app.listen(8080);
var io = require('socket.io')(server);
app.use('/', express.static(__dirname + '/www/'));
var cp = require('child_process');

var workerStarted = false;

function startWorker() {
  var workerProcess = cp.fork(__dirname + '/workerKeepAlive.js', null, { env: process.env });
  workerStarted = true;
}

io.on('connection', function (socket) {
    console.log('socket connected: ' + socket.id);
    var handleMessage = function(data) {
      console.log(socket.id + ' handling message: ' + JSON.stringify(data));

      if (!workerStarted) startWorker();  // start up the child process

      var type = data.type; 
      var responseObj;
      if (type === 'ewd-register' || type === 'ewd-reregister') {
        console.log('ewd-register ' + socket.id);
        data.socketId = socket.id;
        data.ipAddress = socket.request.connection.remoteAddress;
        responseObj = {"type":"ewd-register","finished":true,"message":{"token":"3fc58f5f-9ed3-4165-bdd2-650eca1ed9b6"}}
      }
      else {
        responseObj = {
          type: type,
          finished: true,
          message: {
            hello: 'world'
          }
        };
      }

      socket.emit('ewdjs', responseObj);
    };

    socket.on('ewdjs', handleMessage);

    socket.on('disconnect', function() {
      console.log('socket ' + socket.id + ' disconnected');
    });
});

Here's the very simple child process keepalive script:

console.log('child process started: ' + process.pid);
setTimeout(function() {
  console.log('child process ' + process.pid + ' terminating');
  process.exit();
}, 20000);

Note: if you start the child process before socket.io is loaded, it behaves OK. Start it once socket.io is loaded and the problem occurs.

I'd hazard a guess that it's an IPC-related problem, with the child process having a handle to the socket.io instance. Switch the socket.io engine to "ws" and the problem goes away.

For my situation, I need to be able to start child processes after socket.io has started, and the run-time environment of the child process needs to be the same as the parent process (although I don't need the socket.io environment in the worker)

@robtweed
Copy link
Author

robtweed commented Jun 1, 2017

Just a quick follow-up - it turns out the process.env issue is a red herring - it even fails with this:

cp.fork(__dirname + '/workerKeepAlive.js');

The critical thing is when the child process is started - start it inside the
io.on('connection', function (socket) {..}
closure and you get the CPU runaway happening when you reload the browser

Start it before this and it's OK

@valette
Copy link

valette commented Jun 15, 2017

Hi! I'm experiencing a similar issue, with slightly different conditions :

1- the client connects via socket.io
2- the server spawns a child process with same env via a require('child_process').exec() call
3- the client diconnects before the child proces exits

Then the server CPU goes 100%.
If the child process ends before disconnection, nothing happens.
As reported above, the option {wsEngine: 'ws'} fixes the issue.

@robtweed
Copy link
Author

robtweed commented Aug 4, 2017

Any resolution to this sorted out? It would be good to be able to switch to uws as the engine

@Haocen
Copy link

Haocen commented Aug 31, 2017

I also run into this problem. I set to use polling only for now.

@l3tnun
Copy link

l3tnun commented Sep 5, 2017

Me too.
This problem is happen in Linux. (Debian stretch, Ubuntu 16.04)
However, that does not happen in macOS.

@valette
Copy link

valette commented Oct 2, 2017

Hi, It seems that this bug has disappeared on my side with recent updates.

@Haocen
Copy link

Haocen commented Oct 2, 2017

@valette could you please elaborate? I just checked npm and the latest version was published 4 months ago.

Are you referring to the latest develop branch here in this repo?

@valette
Copy link

valette commented Oct 2, 2017

no, I do not know exactly, but something changed and the bug is no longer visible (maybe an update in core node?). I am not using any development branch, only released packages.

@Haocen
Copy link

Haocen commented Oct 2, 2017

I load tested the library with wsEngine set to ws, after only 500~ worker kill/fork the 100% cpu problem comes back and keep haunting me, this is basically a no go sign for me now.

I'm putting together demo code and will try to host it somewhere for others to play with.

@ghost
Copy link

ghost commented Oct 18, 2017

Just so you know, i'm having the same issue with a 30 worker environment. Same patern. At least now that I have found your issue, i'm know i'm not crazy. I'll keep making some tests and post anything I can find.

@juergengunz
Copy link

same issue for me,
same OS and I am using React Native on client side

@juergengunz
Copy link

wsEngine: 'ws'
does not fix the issue for me...
any other ideas?

@michaeljho
Copy link

michaeljho commented Dec 20, 2017

Hi,

I saw symptoms similar to the ones described above, namely:

  1. socket handler code generating a child-process fork
  2. socket disconnect occurs before fork exits
  3. node thrashing repeatedly trying to reap the socket somewhere in Socket.onclose()
  4. event loop does not fully block because the reaper fails, and retries, fails, and retries in a loop until the fork is killed. CPU appears pegged at 100% but in reality the event loop is just getting clogged with these retries. Node server still able to respond to requests but super slow (and gets slower with more volume of course)
  5. kill the fork via cmd line kill signal or via eventual code process.exit and boom, socket cleanup finishes and node stops thrashing.

I was able to work around this by handling the Socket "disconnecting" event (new in 2.x?) and cleaning up associated forks before the rest of the onclose() executes. See below for the onclose() impl to see why this works. EventEmitter emits disconnecting before anything else is done, so you can safely kill all relevant child processes in an event handler here.

Socket.prototype.onclose = function(reason){
  if (!this.connected) return this;
  debug('closing socket - reason %s', reason);
  this.emit('disconnecting', reason);
  this.leaveAll();
  this.nsp.remove(this);
  this.client.remove(this);
  this.connected = false;
  this.disconnected = true;
  delete this.nsp.connected[this.id];
  this.emit('disconnect', reason);
};

Hope this helps anyone else having this issue currently.

@darrachequesne I'd recommend a closer look at the onclose flow and see if there is a place where retry logic can be handled better during socket cleanup. Maybe an increasing retry interval would be acceptable, followed by a thrown Error after n retries? At least then it wouldn't blow up the event loop with constant retries.

*** Just in case this helps, I wanted to note that this issue does not exist in 1.3.5. I didn't notice anything in the diffs between 1.3.5 and 2.0.4 that could be the culprit, but could be inside a dependency, as many of these were upgraded in 2.x

DETAILS: Node.js 8.4.0, socket-io 2.0.4 running on Debian 8.9

darrachequesne added a commit to socketio/engine.io that referenced this issue Dec 29, 2017
@giantss
Copy link

giantss commented Mar 5, 2018

@juergengunz Hello, Can it be used in the react native project? I plan to use it. Please advise.

@darrachequesne
Copy link
Member

Note: ws will be restored as default wsEngine starting version 2.1.0

@gaurav21r
Copy link

I am having the same issue, even without using child_process. This only happens after quite a long time of the server being active!

@DonavanMartin
Copy link

DonavanMartin commented Dec 12, 2018

Is still an issue with node >= 11.1.0 ? After a chrome-devtools inspect... It's look like a nodejs infinite loop.

@ipetrovic11
Copy link

Looks like we are having the same issue still in 2020.
Socket.io version 2.2.0
Node.js version 12.15.0

As I can see this is now default { "wsEngine": "ws" } so not sure if it is related to the same issue.
One specific thing is that we are using socket.io-redis as adapter for running multiple instances.

@cesarlima-hotmart
Copy link

cesarlima-hotmart commented Mar 10, 2020

Same issue here. I'm also using redis adapter. Thinking of use ws instead.

@DancingCorp
Copy link

Same issue. I'm also using redis adapter.

@dogeow
Copy link

dogeow commented Feb 11, 2021

WX20210211-005908@2x

WX20210211-085308@2x

@darrachequesne
Copy link
Member

For future readers:

  • ws was reverted as the default WebSocket server implementation in socket.io@2.1.0 due to stability issues with uws
  • uws is now deprecated (its maintainer now works on uWebSockets.js, but it does not expose the same API as ws)
  • you can replace it by eiows, which is a fork of uws

Documentation: https://socket.io/docs/v3/server-installation/#Other-WebSocket-server-implementations

I'm closing this, as I don't think there is much we can do in Socket.IO itself.

@darrachequesne darrachequesne added this to the 2.1.0 milestone Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests