Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeOrm: This socket has been ended by the other party #1664

Closed
tommarien opened this issue Oct 28, 2022 · 9 comments · Fixed by #1666
Closed

TypeOrm: This socket has been ended by the other party #1664

tommarien opened this issue Oct 28, 2022 · 9 comments · Fixed by #1666

Comments

@tommarien
Copy link
Contributor

tommarien commented Oct 28, 2022

I've seen there a lot of open issues considering this, i may have stumbled upon something interesting while investigating the very same error.

Our context

We are using an AWS serverless architecture where our mysql server is behind an RDS Proxy.

First i was thinking we were missing the configuration to keep our connections in the pool alive, see this issue for more info.

Although that seems functional it did not have any effect on our particular situation.

We have our idleclientTimeout for the RDS proxy configured around 120 secs, so the issue is really a big issue with us. Although we could up the timeout to something more reasonable we would keep finding ourselves in the same situation after lets say 30 minutes.

Because of the specific sleepstate of lamda's on aws, the keepalive makes no difference, because no user codes executes anymore after running the function.

I've seen we did some handling in connection.js of the connection's close event.

Findings

The socket has been ended happens when the connection is used by a query, so i changed something in the connection.js class and added a listener on the end event

    this.stream.on('end', () => {
      console.warn('connection recycle');
    });

    // below is existing code ;)
    this.stream.on('close', () => {
      // we need to set this flag everywhere where we want connection to close
      if (this._closing) {
        return;
      }
      if (!this._protocolError) {
        // no particular error message before disconnect
        this._protocolError = new Error('Connection lost: The server closed the connection.');
        this._protocolError.fatal = true;
        this._protocolError.code = 'PROTOCOL_CONNECTION_LOST';
      }
      this._notifyError(this._protocolError);
    });

With that change, i could see that when starting the function the log contained a 'connection recycle' item, just before the function fired. And off course failed with the 'socket has been ended shizzle'

From the time is changed that function to contain the same logic as the `on('close`` handler the problem of the socket has been ended disappears and everything works is it should.

    this.stream.on('end', () => {
      console.warn('connection recycle');
      // we need to set this flag everywhere where we want connection to close
      if (this._closing) {
        return;
      }
      if (!this._protocolError) {
        // no particular error message before disconnect
        this._protocolError = new Error('Connection lost: The server closed the connection.');
        this._protocolError.fatal = true;
        this._protocolError.code = 'PROTOCOL_CONNECTION_LOST';
      }
      this._notifyError(this._protocolError);
    });

Please advise, i don't mind creating a pull for this if you think this is useful :)

i don't think this is the actual code change that is needed but i just seems to solve the issue, it has something to do with the setting of the _fatalError.

Update

when i compare the handling with the mysql package i see them handling the end event too, so i guess we need to do something there

@sidorares
Copy link
Owner

So what you are saying is when you lambda is suspended the code don't see close event, but later when its run again underlying tcp connection no longer alive, the end event is fired ( but no close event ). Is that correct?

I can't remember exactly why the handler lives in 'close' event. Can you try running tests with your change ( everything moved to 'end' handler )?

from the docs:

The 'close' event is emitted when the stream and any of its underlying resources (a file descriptor, for example) have been closed. The event indicates that no more events will be emitted, and no further computation will occur.

The 'end' event is emitted when there is no more data to be consumed from the stream.

The 'finish' event is from the stream.Writable class. The 'finish' event is emitted after stream.end() is called and all chunks have been processed by stream._transform(). In the case of an error, 'finish' should not be emitted.

@tommarien
Copy link
Contributor Author

You are correct, it looks like the end event is fired first in the start of the lambda. It does not make sense I know. I'll try to dig in deeper to see what just goes on under hood. But it's tricky because of the abstraction that typeorm gives over it.

Thanks for the quick response. I am sure that moving everything to the end event would have other drawbacks when your are opening an closing connections yourself.

Do you have a clue why copying the code to the end event would close the connection? So that the next query recreates one ? Could not find that in the code

@sidorares
Copy link
Owner

So that the next query recreates one

Individual connections never reuse underlying transport stream - if it ( tcp connection for example ) is closed that means you need to create a new connection ( as opposed to the pool where it tracks how many open connection it allows to keep, when existing connection is closed for any reason that decreases number of open connections and allows to open a new one if previously pool was already at a limit )

@tommarien
Copy link
Contributor Author

Ok, but I am in a pool otherwise the connection would have closed by the previous iteration. I just ment I don't see in code where the connection is released from the pool by just setting the connection as if iet had a fatal error

@sidorares
Copy link
Owner

Not sure I fully follow

From the pool point of view both 'end' and 'close' probably mean that connection is not available for reuse, maybe we should make a table of known scenarios and order when they ( or one of them ) are fired, e.i "server is killed", "server closed connection because of other connection kill threadId query", "network cable unplugged / wifi turned off", "node process is suspended / resumed" etc

@tommarien
Copy link
Contributor Author

I'll try to put some more useful logs to see I can make sense of what the order of events are

@tommarien
Copy link
Contributor Author

tommarien commented Oct 31, 2022

Here is the requested added logging, i've used the internalId inside of connection.js to better illustrate where what is happening

First Run

  • Nest starts
  • typeorm executing SELECT VERSION() AS version
  • PoolConnection 0 is started
  • Connection 0 has received query SELECT VERSION() AS version
  • Nest app creation finished
  • execution method inside lambda
  • 2 application specific queries on Connection 0
  • end of method execution inside lambda

Second invoke within 120 seconds threshold (so rds did not terminate the socket)

  • execution method inside lambda
  • 2 application specific queries on Connection 0
  • end of method execution inside lambda

Third invoke outside of the 120 seconds threshold

  • Connection 0 has received end event (added myself inside connection js)
  • execution method inside lambda
  • 1st application specific query
  • PoolConnection 0 has received error event
  • QueryFailedError: This socket has been ended by the other party

What is weird about this is that the PoolConnection never received an end event, then i realised its because the connection is never emitting the end event in this particular case so i've added it as follows inside the the connection.js file

    this.stream.on('end', () => {
      console.log(`Connection ${this._internalId} has received end event`);
      this.emit('end'); // This ensure the PoolConnection receives the end event
    });

From here on, everything seems to behave accordingly

First Run

  • Nest starts
  • typeorm executing SELECT VERSION() AS version
  • PoolConnection 0 is started
  • Connection 0 has received query SELECT VERSION() AS version
  • Nest app creation finished
  • execution method inside lambda
  • 2 application specific queries on Connection 0
  • end of method execution inside lambda

Second invoke within 120 seconds threshold (so rds did not terminated the socket)

  • execution method inside lambda
  • 2 application specific queries on Connection 0
  • end of method execution inside lambda

Third invoke outside of the 120 seconds threshold

  • Connection 0 has received end event
  • PoolConnection 0 has received end event
  • Connection 0 has received close event
  • PoolConnection 0 has received error event
  • execution method inside lambda
  • PoolConnection 1 has started
  • 2 application specific queries on Connection 1
  • end of method execution inside lambda

Thanks for your help again, please advise on what would be a next step 🙇

@tommarien
Copy link
Contributor Author

@sidorares is it ok if i create a small pr for this change, just so that connection emits 'end' events from the underlying stream ?

@sidorares
Copy link
Owner

Go for it @tommarien

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants