Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

_stream_writable.js - doWrite() runs in an endless loop #1992

Closed
cookiengineer opened this issue Jun 16, 2015 · 21 comments
Closed

_stream_writable.js - doWrite() runs in an endless loop #1992

cookiengineer opened this issue Jun 16, 2015 · 21 comments
Labels
stream Issues and PRs related to the stream subsystem.

Comments

@cookiengineer
Copy link

Sometimes (on weird clients connecting) the same error happens again. The call stack size exceeds in _stream_writeable.js' doWrite() method.

  • Why is this the case?
  • What different scenarios could trigger this?
  • How to debug / how to fix it?
_stream_writable.js:287
function doWrite(stream, state, writev, len, chunk, encoding, cb) {
                ^
RangeError: Maximum call stack size exceeded
    at doWrite (_stream_writable.js:287:17)
    at writeOrBuffer (_stream_writable.js:282:5)
    at Socket.Writable.write (_stream_writable.js:210:11)
    at Socket.write (net.js:601:40)

Update / TL;DR

net.Socket.write() might trigger an error. If the protocol implementation still tries to send a close code (e.g. by WebSockets spec), this leads to an endless loop.

@brendanashworth brendanashworth added the stream Issues and PRs related to the stream subsystem. label Jun 16, 2015
@mscdex
Copy link
Contributor

mscdex commented Jun 16, 2015

What is your code doing? HTTP? Plain TCP? Something else?

@cookiengineer
Copy link
Author

It's implementing raw WebSockets via net.Socket and I think this bug is related to the close/end event that is triggered on disconnect.

I'm unsure about it, because it only happens when some blackhat scripts are trying to DDoS our servers just before. So I guess the events might be fired in false order due to their falsy raw socket connections or similar.

The close() method in the implementation is calling socket.write() with the proper websocket close code and then has as in the callback the socket.end() and socket.destroy() call.

So the close() method looks something like this (reduced):

lychee.net.protocol.WS.prototype.close = function(status) {
    // (...) prepare buffer with String.fromCharCode() stuff

    this.socket.write(buffer, function(err) {
        this.end();
        this.destroy();
    }.bind(this.socket));
};

I'm unsure if I'm using the events API correctly as there's no up2date docs available that show how the event flow is actually like :-/ Here's what the event bindings look like, before filing the issue, the "end" event was also bound. I'm chiming in back here if the error still occurs and is related to another event (or if I can trace down my falsy API usage).

        this.socket.on('error', function() {
            that.close(Class.STATUS.protocol_error);
        });

        this.socket.on('timeout', function() {
            that.close(Class.STATUS.going_away);
        });

        // this.socket.on('end', function() {
        //  that.close(Class.STATUS.normal_closure);
        // });

        this.socket.on('close', function() {
            that.close(Class.STATUS.normal_closure);
        });

@trevnorris
Copy link
Contributor

See how cb is passed to doWrite()? That callback is called synchronously from Socket#_writeGeneric() if the write queue can be flushed immediately by the kernel (via uv_try_write()). So if you're able to quickly process many small packets it's possible to catch this in a synchronous loop.

I believe this is a related issue: nodejs/node-v0.x-archive#6623

@cookiengineer
Copy link
Author

I think I traced the bug.

Does net.Socket.write() trigger an error event? That would cause an endless loop in my implementation, because on error, the implementation still tries to close it properly according to the WebSocket spec.

@trevnorris
Copy link
Contributor

Yes. There are two places it can error. First is if you're attempting to write to a closed socket, the other is if the actual write failed (i.e. uv_write()).

@sam-github
Copy link
Contributor

I'd expect an error to be unrecoverable, trying to write a ws spec error packet to tell the other side you got a write failure when writing to the other side isn't going to work. socket.destroy() is probably the right choice on error.

@saghul
Copy link
Member

saghul commented Jun 24, 2015

the other is if the actual write failed (i.e. uv_write()).

FTR, if the actual write(2) fails inside uv_write the error is deferred to the callback. From Node/iojs prespective, any error returned by uv_write is an internal error and you may as well abort(). (Note that when multiple (> 4) buffers are passed this may fail with ENOMEM as well)

@trevnorris
Copy link
Contributor

@saghul Thanks for the information. So basically the extra logic here: https://github.com/nodejs/io.js/blob/v2.3.1/lib/net.js#L668-L672 is totally pointless. About ENOMEM, does that depend on the number of buffers only, or the total length of buffers?

@bnoordhuis So instead should we just FatalError() from StreamWrap::DoWrite() if an error code is returned? An exception might be the ENOMEM case.

@bnoordhuis
Copy link
Member

There's also EBADF. If you get that error it's almost certainly an application bug (something closed the file descriptor from underneath the handle) but a hard abort isn't the nicest way to report that.

@trevnorris
Copy link
Contributor

@bnoordhuis Think it would be reasonable for us to filter the error codes, return those that are application bugs and abort on internal errors?

@saghul
Copy link
Member

saghul commented Jun 24, 2015

About ENOMEM, does that depend on the number of buffers only, or the total length of buffers?

No, we malloc and copy the structures, not the buffer content.

There's also EBADF. If you get that error it's almost certainly an application bug (something closed the file descriptor from underneath the handle) but a hard abort isn't the nicest way to report that.

True that. Not sure what the policy is in Node, but I guess this should never reach the user, it kind of implies an internal error in Node, IMHO.

@bnoordhuis
Copy link
Member

Not sure what the policy is in Node, but I guess this should never reach the user, it kind of implies an internal error in Node, IMHO.

It could be a bug in the user's code: fs.close(wrongFd)

@saghul
Copy link
Member

saghul commented Jun 24, 2015

Doh, indeed. Sorry for the noise then.

@trevnorris
Copy link
Contributor

Just to verify, then there aren't any errors we'd want to abort on?

@Fishrock123
Copy link
Contributor

So... is this a bug, or?

@cookiengineer
Copy link
Author

From my side, it's still unclear what I should do in order to support a correct WebSocket behaviour. As @sam-github mentioned, the destroy event should be the one to use for a websocket close code. For me it's unclear how to detect if the socket was closed properly from the foreign side or if the server closes the socket because of a real spec-defined error.

Is there an XOR condition between socket destroy and socket end?

Or can the end event lead to a destroy event (or vice versa)?
If so, that might not be wanted from the implementor's side for most TCP based protocols as pretty much all have some kind of status codes for socket conditions and it wouldn't make sense to have an error in the web browser if the user just reloaded the page and the socket got destructed.

@jasnell
Copy link
Member

jasnell commented Mar 9, 2016

@trevnorris @saghul ... is there reason to keep this one open?

@jasnell
Copy link
Member

jasnell commented Apr 2, 2016

Closing given the lack of further activity.

@jasnell jasnell closed this as completed Apr 2, 2016
@cookiengineer
Copy link
Author

Really? I don't know, but I think this is a security issue as it can crash a node server implementation, given that the attacker sends the right tcp events in the right order. If this isn't an issue, then at least provide better docs about which event can potentially fire which kind of error event. Otherwise pretty any implementation that handles TCP close frames will be broken.

@jasnell
Copy link
Member

jasnell commented Apr 2, 2016

If someone can confirm that this a bug that needs fixing (@saghul @trevnorris @bnoordhuis ?) we can definitely reopen. Just closed because there's been no further discussion or investigation on it since last July.

@jasnell
Copy link
Member

jasnell commented Apr 2, 2016

Can you verify whether this is still an issue in master?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

9 participants