-
Notifications
You must be signed in to change notification settings - Fork 865
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[core] Fixed too early closed caller socket in background. #1750
[core] Fixed too early closed caller socket in background. #1750
Conversation
This PR changes the behavior of a socket that failed to connect due to connection timeout or due to a rejected connection. |
The behaviour described above should be mentioned somewhere in the documentation. Possible places:
|
Let me clarify one thing. This doesn't "change behavior" towards the version 1.4.1 of SRT, but restores the old behavior that was spoilt in the latest version. The behavior since always and up to version 1.4.1 was such that a socket that failed to connect remains in the CONNECTING state and the only sign of error is in epoll ERR flag. During the works on fixing problems found on Makito there was added a PR that was doing consistently the same thing in case of connection timeout report, connection rejection and also connection broken, including on reception of SHUTDOWN. This however has changed the behavior of an async socket connection. |
i->u->m_bConnecting = false; | ||
i->u->updateBrokenConnection(); | ||
|
||
// DO NOT close the socket here because in this case it might be | ||
// unable to get status from at the right moment. Also only member | ||
// sockets should be taken care of internally - single sockets should | ||
// be normally closed by the application, after it is done with them. | ||
|
||
// app can call any UDT API to learn the connection_broken error | ||
CUDT::s_UDTUnited.m_EPoll.update_events(i->u->m_SocketID, i->u->m_sPollID, SRT_EPOLL_IN | SRT_EPOLL_OUT | SRT_EPOLL_ERR, true); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... a socket that failed to connect remains in the CONNECTING state and the only sign of error is in epoll ERR flag.
In the current version and in this PR it does not remain in the connecting state.
Does it change the behavior towards 1.4.1 then?
i->u->m_bConnecting = false;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the code that existed in 1.4.1. This had to be removed in order to move the closing activities outside the lock.
if (CTimer::getTime() >= i->m_ullTTL)
{
HLOGC(mglog.Debug,
log << "RendezvousQueue: EXPIRED (" << (i->m_ullTTL ? "enforced on FAILURE" : "passed TTL")
<< ". removing from queue");
// connection timer expired, acknowledge app via epoll
i->m_pUDT->m_bConnecting = false;
CUDT::s_UDTUnited.m_EPoll.update_events(i->m_iID, i->m_pUDT->m_sPollID, UDT_EPOLL_ERR, true);
/*
* Setting m_bConnecting to false but keeping socket in rendezvous queue is not a good idea.
* Next CUDT::close will not remove it from rendezvous queue (because !m_bConnecting)
* and may crash here on next pass.
*/
if (AF_INET == i->m_iIPversion)
delete (sockaddr_in *)i->m_pPeerAddr;
else
delete (sockaddr_in6 *)i->m_pPeerAddr;
// i_next was preincremented, but this is guaranteed to point to
// the element next to erased one.
i_next = m_lRendezvousID.erase(i);
continue;
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it does not remain in the CONNECTING state, does it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changing these flags have nothing to do with the "socket state". The socket state for a connection-timedout socket is still SRTS_CONNECTING
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So internally it will have m_Status == SRTS_CONNECTING
, but the status from the API seems to be "BROKEN" when not connecting and not connected.
A bit confusing, but the motivation seems to be to have a socket in a BROKEN state, but not scheduled for closing by GC.
SRT_SOCKSTATUS CUDTSocket::getStatus()
{
// TTL in CRendezvousQueue::updateConnStatus() will set m_bConnecting to false.
// Although m_Status is still SRTS_CONNECTING, the connection is in fact to be closed due to TTL expiry.
// In this case m_bConnected is also false. Both checks are required to avoid hitting
// a regular state transition from CONNECTING to CONNECTED.
if (m_pUDT->m_bBroken)
return SRTS_BROKEN;
// Connecting timed out
if ((m_Status == SRTS_CONNECTING) && !m_pUDT->m_bConnecting && !m_pUDT->m_bConnected)
return SRTS_BROKEN;
return m_Status;
}
|
I am investigating a change of behavior with EnforcedEncryption on Makito (ref: NKE-1042): |
The test matrix can be found here. |
Found it while you replied and updated my previous comment NonBlocking B.4. |
Test
|
Proper fix for #1736.
This reverses a change that was done in order to properly update groups when a caller socket has failed to connect. There was added a request to perform close-alike things (with setting flags and releasing thread-related resources) that caused the socket to be actually prematurely closed. This caused inability to properly deliver closure information to the caller callback and possibly too early execution of the GC on that socket that deleted the socket before it could be properly reported in the receiver queue thread.
This PR also contains reversal of the changes in the tests that were failing because of this problem.
Changes from #1736 could be also incorporated here as a sanity check.