Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fixed async event reporting and mutex lock ordering for groups #1639

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
bbd9ac6
[BUG] Fixed async event reporting for conn/break for groups
Nov 6, 2020
9c482d8
CodeFactor
Nov 6, 2020
debe537
[core] Fixes for proper locking order and consist for group access
Nov 6, 2020
ea7f936
CodeFactor
Nov 6, 2020
378cc00
Fixed proper bonding conditional
Nov 6, 2020
9d3301f
Fixed: restored the old behavior of idle continuing on rejected calle…
Nov 6, 2020
666a6c4
Added an mxptool-like connect callback for group
Nov 9, 2020
ee42a9c
Fixed group iterator local var that survived unlock-lock cycle
Nov 9, 2020
8674b45
Fixed incorrect socket operations when deleted while connecting
Nov 9, 2020
6c8e50b
Changed order of modification for group-bind data. To make sure iter …
Nov 9, 2020
e747cd6
Updated to latest master
Nov 9, 2020
b3aeb5d
Enabled non-blocking group test
Nov 9, 2020
6e0d6df
Withdrawn testing changes (moved to a separate PR)
Nov 9, 2020
5c20fc8
Fixed alignment in core. Completed _LOCKED lacking markers
Nov 9, 2020
48f7a08
Restored old rules for closing a socket. Added removal from group whe…
Nov 10, 2020
020008a
Fixed correct epoll update on error. Added some logs and comments. Ad…
Nov 10, 2020
ece0721
Ensured correct removal of a broken socket
Nov 10, 2020
f35555e
Fixed unguarded m_sPollID in closeInternal
Nov 10, 2020
3fa5b49
Moved group and callback update for failed sockets out of lock
ethouris Nov 11, 2020
8237eae
Small reverting change
Nov 11, 2020
42058da
Upgraded test_bonding with accepting group from listener
ethouris Nov 12, 2020
b5810e7
Fixed bug: CUDTUnited::close could interrupt group closing
ethouris Nov 12, 2020
74135f7
Fixed problems that could be caused by parallelly closed group
ethouris Nov 13, 2020
d9bea62
Removed unnecessary locks that may cause a deadlock
Nov 13, 2020
5f7a375
Auto-deletion of epoll subscribers from a closed group. Fixed exc han…
Nov 13, 2020
12572a0
Added more logs
Nov 13, 2020
6caec08
Added lacking NULL-check for closed member sockets. Added paranoid ch…
Nov 16, 2020
981099e
Fixed: Added mistakenly unconditional deleteGroup. Added group tortur…
Nov 16, 2020
c84bf73
CodeFactor
Nov 16, 2020
430b3dc
Fixed incorrectly used C++11 syntax for C++03 file
Nov 16, 2020
9323568
Blocked (experimental) connect callback as called in wrong place
Nov 16, 2020
7ec47bf
Changed the method of closing a pending socket
Nov 16, 2020
da7ffcb
Moved mutex ordering information to a doc document
Nov 20, 2020
540cd4d
Merge branch 'master' into dev-pr-fix-group-async-event-reporting
Nov 20, 2020
9aa49cb
Updated doc information about connect callback
Nov 20, 2020
2f3cc3b
Fixed clearing epoll events at the beginning of groupConnect
Nov 20, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 12 additions & 10 deletions docs/API-functions.md
Original file line number Diff line number Diff line change
Expand Up @@ -631,15 +631,15 @@ different families (that is, both `local_name` and `remote_name` must be `AF_INE

### srt_connect_callback
```
int srt_connect_callback(SRTSOCKET clr, srt_connect_callback_fn* hook_fn, void* hook_opaque);
int srt_connect_callback(SRTSOCKET u, srt_connect_callback_fn* hook_fn, void* hook_opaque);
```

This call installs a callback hook, which will be executed on a given `clr`
socket just after the pending connection situation in the background has been
resolved (that is, when the connection succeeded or failed). Note that this
function is not guaranteed to be called if the `clr` socket is set to blocking
mode (`SRTO_RCVSYN` option set to true). It is guaranteed to be called when
a socket is in non-blocking mode, or when you use a group.
This call installs a callback hook, which will be executed on a given `u`
socket or all member sockets of `u` group, just after the pending connection
situation in the background has been resolved and the connection has failed.
Note that this function is not guaranteed to be called if the `u` socket is set
to blocking mode (`SRTO_RCVSYN` option set to true). It is guaranteed to be
called when a socket is in non-blocking mode, or when you use a group.

This function is mainly intended to be used with group connections. Note that
even if you use a group connection in blocking mode, after the group is considered
Expand All @@ -650,13 +650,15 @@ or all fail - in such a case those failures also happen only in the background,
while the connecting function blocks until all connections are resolved.
When all links fail, you will only get a general error code for the group.
This mechanism allows you to get individual errors for particular member
connections.
connection failures.

You can also use this mechanism as an alternative method for a single-socket
connection in non-blocking mode to trigger an action when the connection
process is finished.
process is finished. It is recommended, however, that you use this callback
only to collect failure information, as the call will happen in one of the
internal SRT threads.

* `clr`: Socket that will be used for connecting and for which the hook is installed
* `u`: Socket or group that will be used for connecting and for which the hook is installed
* `hook_fn`: The callback hook function pointer
* `hook_opaque`: The pointer value that will be passed to the callback function

Expand Down
239 changes: 239 additions & 0 deletions docs/LowLevelInfo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,239 @@
# Low Level Information for the SRT project

## Introduction

This documnent will contain loose information for various topis referring to
the SRT source code describing some cross-source analysis that would be
unobvious for a source code reviewer. It's not a complete documentation of
anything, rather a collection of various kind of information retrieved during
development and even reverse engineering.

## Mutex locking

This analysis is a result of detected lots of cascade mutex locking in the
SRT code. A more detailed analysis would be required as to which mutex is
going to protect what kind of data later.

Here is the info collected so far:

### Data structures

The overall structure of the object database, involving sockets and groups
is as follows (pseudo-language):

```
CUDTUnited (singleton) {
CONTAINER<CUDTSocket> m_Sockets;
CONTAINER<CUDTSocket> m_ClosedSockets;
CONTAINER<CUDTGroup> m_Groups;
CONTAINER<CUDTGroup> m_ClosedGroups;
}

CUDTGroup {
type SocketData { CUDTSocket* ps; SRTSOCKET id; int state; ... }
CONTAINER<SocketData> m_Group;
}
```

Dead sockets (either closed manually or broken after connection) are
moved first from `m_Sockets` to `m_ClosedSockets`. The GC thread will take
care to delete them physically after making sure all inside facilities
do not contain any remaining data of interest.

Groups may only be manually closed, however a closed group is moved
to `m_ClosedGroups`. The GC thread will take care to delete them, as long
as their usage counter is 0. Every call to an API function (as well as
TSBPD thread) increases the usage counter in the beginning and decreases
upon exit. A group may be closed in one thread and still being used in
another. The group will persist up to the time when the current API function
using it exits and decreases the usage counter back to 0.

Containers and contents guarded by mutex:

`CUDTUnited::m_GlobControlLock` - guards all containers in CUDTUnited.

`CUDTSocket::m_ControlLock` - guards internal operation performed on particular
socket, with its existence assumed (this is because a socket will always exist
until it's deleted while being in `m_ClosedSockets`, and when the socket is in
`m_ClosedSockets` it will not be deleted until it's free from any operation,
while the socket is assumed nonexistent for any newly called API function even
if it exists physically, but is moved to `m_ClosedSockets`).

`CUDTGroup::m_GroupLock` - guards the `m_Group` container inside a group that
collects member sockets.

There are unfortunately many situations when multiple locks have to be applied
at a time. This is then the hierarchy of the mutexes that must be preserved
everywhere in the code.

As mutexes cannot be really ordered unanimously, below are two trees, with also
some possible branches inside. The mutex marked with (T) is terminal, that is,
no other locks shall be allowed in the section where this mutex is locked.

### Mutex ordering information

Note that the list isn't exactly complete, but it should contain all
mutexes for which the locking order must be preserved.

```

- CUDTSocket::m_ControlLock

- CUDT::m_ConnectionLock

- CRendezvousQueue::m_RIDVectorLock

- CUDTUnited::m_GlobControlLock

- CUDTGroup::m_GroupLock

- CUDT::m_RecvAckLock || CEPoll::m_EPollLock(T)

----------------
- CUDTUnited::m_GlobControlLock

- CUDTGroup::m_GroupLock || CSndUList::m_ListLock(T)

- CUDT::m_ConnectionLock

- CRendezvousQueue::m_RIDVectorLock

- CUDT::m_SendLock

- CUDT::m_RecvLock

- CUDT::m_RecvBufferLock

- CUDT::m_RecvAckLock || CUDT::m_SendBlockLock
------------------

ANALYSIS ON: m_ConnectionLock


-- CUDT::startConnect flow

CUDTUnited::connectIn -- > [LOCKED s->m_ControlLock]
CUDT::open -- > [MAYBE_LOCKED m_ConnectionLock, if bind() not called]
CUDT::clearData --> [LOCKED m_StatsLock]
CUDTUnited::updateMux -- > [LOCKED m_GlobControlLock]
{
[SCOPE UNLOCK s->m_ControlLock, if blocking mode]
CUDT::startConnect -- > [LOCKED m_ConnectionLock]
CRcvQueue::registerConnector
CRendezvousQueue::insert --> [LOCKED CRendezvousQueue::m_RIDVectorLock]
}
END.

-- CUDT::groupConnect flow

CUDT::groupConnect (no locks)
CUDT::setOpt [LOCKS m_ConnectionLock, m_SendLock, m_RecvLock]
{ [LOCKS m_GlobControlLock]
CUDTGroup::add [LOCKS m_GroupLock]
}
CUDT::connectIn --> continue with startConnect flow

-- CUDTUnited::listen (API function)

CUDTUnited::listen
CUDTUnited::locateSocket [LOCKS m_GlobControlLock]
{
[SCOPE LOCK s->m_ControlLock]
CUDT::setListenState -- > [LOCKED m_ConnectionLock]
CRcvQueue::setListener -- > [LOCKED m_LSLock]
}

-- CUDT::processAsyncConnectRequest

CRcvQueue::worker ->
...
CRcvQueue::worker_TryAsyncRend_OrStore
CUDT::processAsyncConnectResponse -- > [LOCKED m_ConnectionLock]
CUDT::processConnectResponse
CUDT::postConnect
CUDT::interpretSrtHandshake ->
[IF group extension found]
CUDT::interpretGroup
{
[SCOPE LOCK m_GlobControlLock]
[IF Responder]
{
CUDT::makeMePeerOf
[LOCKS m_GroupLock]
CUDTGroup::syncWithSocket
CUDTGroup::find --> [LOCKED m_GroupLock]
}
debugGroup -- > [LOCKED m_GroupLock]
}


-- CUDT::acceptAndRespond

CRcvQueue::worker_ProcessConnectionRequest
{
[SCOPE LOCK m_LSLock]
CUDT::processConnectRequest
CUDTUnited::newConnection
locateSocket -- > [LOCKED m_GlobControlLock]
locatePeer -- > [LOCKED m_GlobControlLock]
[IF failure, LOCK m_AcceptLock]
generateSocketID --> [LOCKED m_IDLock]
CUDT::open -- > [LOCKED m_ConnectionLock]
CUDT::updateListenerMux -- > [LOCKED m_GlobControlLock]
CUDT::acceptAndRespond --> [LOCKED m_ConnectionLock]
CUDT::interpretSrtHandshake ->
[IF group extension found]
CUDT::interpretGroup
{
[SCOPE LOCK m_GlobControlLock]
[IF Responder]
{
CUDT::makeMePeerOf
[LOCKS m_GroupLock]
CUDTGroup::syncWithSocket
CUDTGroup::find --> [LOCKED m_GroupLock]
}
debugGroup -- > [LOCKED m_GroupLock]
}
{
[SCOPE LOCK m_GlobControlLock]
CUDT::synchronizeWithGroup -- > [LOCKED m_GroupLock]
}
CRcvQueue::setNewEntry -- > [LOCKED CRcvQueue::m_IDLock]
{
[SCOPE LOCK m_GlobControlLock]
{
[SCOPE LOCK m_GroupLock]
}
}
{
[SCOPE LOCK m_AcceptLock]
CEPoll::update_events
}
[IF Rollback]
CUDT::closeInternal
[LOCKING m_EPollLock]
{
[SCOPE LOCK m_ConnectionLock]
[SCOPE LOCK m_SendLock]
[SCOPE LOCK m_RecvLock]
[LOCKING m_RcvBufferLock]
}
{
[SCOPE LOCK m_GlobControlLock]
CUDT::removeFromGroup --> [LOCKED m_GroupLock]
}

CEPoll::update_events
}

-- CUDT::bstats: TRT-LOCKED m_ConnectionLock

-- CUDT::packData

CSndQueue::worker
CSndUList::pop -- > [LOCKED m_ListLock]
CUDT::packData

```

Loading