-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NRE occurred in ReaderThread #29
Comments
Added IrcMac, which is an abstraction layer for writing/reading from the TcpClient. This will prevent NREs from happening since we never set this to null. Our ReceiverThread will also return for ANY Exception, instead of swallowing it and causing an NRE (or now, an ODE) from happening over and over again..
Improved Irc disconnection logic. - ReaderThread will exit on ANY unhandled Exception now (instead of just Socket, and ODE). The watchdog will then reconnect when we don't get a response from the server. - On Disconnection, the ReaderThread will be terminated. Connect() will recreate it. This will prevent multiple reader threads from appearing (who knows what kind of havoc that would create).
So... a couple issues happened with this one: The core of this issue happens when the ReaderThread gets an exception we did not expect. In the most recent case (today) it got an AggregateException when the SSLStream failed for some reason. This caused thousands of these exceptions to occur each time the ReaderThread tried to read from the stream. Since the ReaderThread no longer could read from the stream, so our watch dog would attempt to reconnect. When this happened, the ircReader would be null, and then we'd get a thousand of NullReferenceExceptions (NREs). Notice in all of the 'Known' Exception cases below (SocketException and IOException), we break out of the ReaderThread by calling 'return'. We do not do that in the unknown exception case (where we catch Exception). So, we will keep spinning and throwing Exceptions until the ReaderThread exits (more on that later). while( this.KeepReading )
{
try
{
// ReadLine blocks until we call Close() on the underlying stream.
string s = this.ircReader.ReadLine();
if( ( string.IsNullOrWhiteSpace( s ) == false ) && ( string.IsNullOrEmpty( s ) == false ) )
{
// If KeepReading is set to false, we want this thread to exit.
// Do nothing.
if( this.KeepReading )
{
this.OnReadLine( s );
}
}
}
catch( SocketException err )
{
StaticLogger.Log.WriteLine( "IRC Connection closed: " + err.Message );
if( this.KeepReading )
{
StaticLogger.Log.ErrorWriteLine(
"WARNING IRC connection closed, but we weren't terminating. Wait for watchdog to reconnect..."
);
return;
}
else
{
// We requested the irc connection be closed since KeepReading is false.
// Return, which will terminate this thread.
return;
}
}
catch( IOException err )
{
StaticLogger.Log.WriteLine( "IRC Connection closed: " + err.Message );
if( this.KeepReading )
{
StaticLogger.Log.ErrorWriteLine(
"WARNING IRC connection closed, but we weren't terminating. Wait for watchdog to reconnect..."
);
return;
}
else
{
// We requested the irc connection be closed since KeepReading is false.
// Return, which will terminate this thread.
return;
}
}
catch( Exception err )
{
// Unexpected exception occurred. The connection probably dropped.
// Nothing we can do now except to attempt to try again.
StaticLogger.Log.ErrorWriteLine(
"IRC Reader Thread caught unexpected exception:" + Environment.NewLine + err + Environment.NewLine + "Wait for watchdog to reconnect..."
);
// The bug is here, we do not break out of the loop!
}
} // End While So now to add insult to injury... The watchdog will call AttemptReconnect, which calls Connect(). Connect creates a brand new reader thread... which means now we have 2 reader threads happening at the same time... both trying to read from the socket at the same time. If the watchdog goes again, now we have a third thread, then a fourth. // Start Reading.
this.KeepReading = true;
this.readerThread = new Thread( ReaderThread );
this.readerThread.Name = this.Config.Server + " IRC reader thread";
this.readerThread.Start(); I'm not sure what happens when 2 threads are trying to read from the same stream, but I can imagine it is not good. This might also sometimes explain why the bot sometimes gets in a connect/disconnect loop... if multiple threads are reading from the same stream, I could imagine data would get scrambled, and if we never get a watchdog response, we will always reconnect. |
Tl;DR: The fix is to break out of the ReaderThread on ANY unhandled exception. Then, let the watchdog call Disconnect() which will teardown the ReaderThread, and then Connect() will reconnect it. |
We can see by code-inspection that if we get an unhandled Exception, we leave the ReaderThread: catch( Exception err )
{
// Unexpected exception occurred. The connection probably dropped.
// Nothing we can do now except to wait for the watch dog to trigger a reconnect..
StaticLogger.Log.ErrorWriteLine(
"IRC Reader Thread caught unexpected exception:" + Environment.NewLine + err + Environment.NewLine + "Wait for watchdog to reconnect..."
);
return; // <-- Added in
} This should never happen again. However, before closing, perhaps we should explore adding some unit tests? |
We have unit tests to test the case that caused this issue, and we have fixed the issue. I am fine with closing this issue as resolved. |
Found on our linux test machine. This happens over and over and over again. IRC bot doesn't crash, but NREs are always bad.
2017-10-31T00:24:07.7921880-04:00 ERROR> IRC Reader Thread caught unexpected exception:
System.NullReferenceException: Object reference not set to an instance of an object
at ChaskisCore.IrcConnection.ReaderThread () [0x00006] in <8a88ebfc1baa4ee2a635e86939e0ea4c>:0
Wait for watchdog to reconnect...
2017-10-31T00:24:07.7922670-04:00 ERROR> IRC Reader Thread caught unexpected exception:
System.NullReferenceException: Object reference not set to an instance of an object
at ChaskisCore.IrcConnection.ReaderThread () [0x00006] in <8a88ebfc1baa4ee2a635e86939e0ea4c>:0
Wait for watchdog to reconnect...
The text was updated successfully, but these errors were encountered: