Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Fixed async event reporting and mutex lock ordering for groups #1639

Merged

Conversation

ethouris
Copy link
Collaborator

@ethouris ethouris commented Nov 6, 2020

Fixes #1631, #1625, #1605, #1603, #1612, #1599, #1585

This PR introduces a common function that is run in case of a broken connection. It takes care of correct error reporting to the application using asynchronous connection methods and sets properly epoll readiness flags for the group.

@ethouris ethouris added the Type: Bug Indicates an unexpected problem or unintended behavior label Nov 6, 2020
@maxsharabayko maxsharabayko added the [core] Area: Changes in SRT library core label Nov 6, 2020
@maxsharabayko maxsharabayko self-assigned this Nov 6, 2020
srtcore/core.cpp Outdated Show resolved Hide resolved
srtcore/queue.cpp Outdated Show resolved Hide resolved
@ethouris ethouris changed the title [BUG] Fixed async event reporting for conn/break for groups [BUG] Fixed async event reporting and mutex lock ordering for groups Nov 6, 2020
Copy link
Collaborator

@maxsharabayko maxsharabayko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • TEST(Bonding, DISABLED_NonBlockingGroupConnect) is still disabled, please enable it as well.

Copy link
Collaborator

@maxsharabayko maxsharabayko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes to srt-test-live should go in a separate PR. This one is already big enough and each review iteration consumes time on looking through changes to apps.
For now I will concentrate on changes to the library.

testing/testmedia.cpp Outdated Show resolved Hide resolved
@maxsharabayko maxsharabayko self-assigned this Nov 9, 2020
srtcore/api.cpp Outdated
Comment on lines 129 to 132
// [[using locked(m_GlobControlLock)]]
void CUDTSocket::makeShutdown()
{
#if ENABLE_EXPERIMENTAL_BONDING
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth using debug assertions here and in other places.

SRT_ASSERT(m_GlobControlLock.try_lock() == false);

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could be, but I'd personally prefer this to be used only in a special thread-tracking mode, in which the lock state would be recorded in a special variable, not just in the debug mode.

srtcore/api.cpp Show resolved Hide resolved
srtcore/api.cpp Show resolved Hide resolved
srtcore/core.cpp Outdated Show resolved Hide resolved
srtcore/core.cpp Outdated Show resolved Hide resolved
srtcore/core.cpp Outdated Show resolved Hide resolved
srtcore/group.cpp Show resolved Hide resolved
@maxsharabayko
Copy link
Collaborator

maxsharabayko commented Nov 11, 2020

TODO

  • Improve connect callback unit test

Take TEST(Bonding, NonBlockingGroupConnect) and modify the ConnectCallback as presented further, and the test will hang up with the current version 3fa5b49 and previous versions as well.

void ConnectCallback(void* /*opaq*/, SRTSOCKET sock, int error, const sockaddr* /*peer*/, int token)
{
    std::cout << "Connect callback. Socket: " << sock
        << ", error: " << error
        << ", token: " << token << '\n';

    srt_close(sock);
}

@Haivision Haivision deleted a comment from codecov bot Nov 12, 2020
@maxsharabayko
Copy link
Collaborator

(For future reference)

Data race acessing m_iTsbPdDelay_ms

Thread A:
  Write of size 4 at 0x7ba80000c134 by thread T6 (mutexes: write M365):
    #0 CUDT::fillSrtHandshake_HSREQ(unsigned int*, unsigned long, int) core.cpp:1633 (test-srt:x86_64+0x10025e4f2)
	   [NO LOCKS, writes to m_iTsbPdDelay_ms]
    #1 CUDT::fillSrtHandshake(unsigned int*, unsigned long, int, int) core.cpp:1610 (test-srt:x86_64+0x10025e141)
	   [NO LOCKS]
    #2 CUDT::createSrtHandshake(int, int, unsigned int const*, unsigned long, CPacket&, CHandShake&) core.cpp:2231 (test-srt:x86_64+0x10026386c)
	   [NO LOCK, but scope-locks m_parent->m_ControlLock and s_UDTUnited.m_GlobControlLock a bit later to fill HS group extension.
    #3 CUDT::processAsyncConnectRequest(EReadStatus, EConnectStatus, CPacket const&, sockaddr_any const&) core.cpp:4287 (test-srt:x86_64+0x1002872bd)
	   [NO LOCK]
    #4 CRendezvousQueue::updateConnStatus(EReadStatus, EConnectStatus, CPacket const&) queue.cpp:1049 (test-srt:x86_64+0x1003ae9e5)
	   [UNDER m_RIDVectorLock LOCK, can also change m_pUDT->m_bConnecting = false
    #5 CRcvQueue::worker(void*) queue.cpp:1295 (test-srt:x86_64+0x1003b2447)
	   [NO LOCK]

Thread B:
  Previous read of size 4 at 0x7ba80000c134 by main thread (mutexes: write M317):
    #0 CUDTGroup::syncWithSocket(CUDT const&, HandshakeSide) group.cpp:936 (test-srt:x86_64+0x1003f6e6d)
    #1 CUDTUnited::groupConnect(CUDTGroup*, SRT_GroupMemberConfig_*, int) api.cpp:1584 (test-srt:x86_64+0x1001d0e40)
    #2 CUDTUnited::singleMemberConnect(CUDTGroup*, SRT_GroupMemberConfig_*) api.cpp:1284 (test-srt:x86_64+0x1001ccc44)
    #3 CUDTUnited::connect(int, sockaddr const*, int, int) api.cpp:1270 (test-srt:x86_64+0x1001cd5ad)
    #4 CUDT::connect(int, sockaddr const*, int, int) api.cpp:3486 (test-srt:x86_64+0x1001e2cc5)
    #5 srt_connect srt_c_api.cpp:83 (test-srt:x86_64+0x1003dc2e5)

srtcore/api.cpp Show resolved Hide resolved
srtcore/api.cpp Show resolved Hide resolved
srtcore/api.cpp Show resolved Hide resolved
srtcore/group.cpp Show resolved Hide resolved
@codecov

This comment has been minimized.

@maxsharabayko
Copy link
Collaborator

maxsharabayko commented Nov 16, 2020

TODO:

  • Update documentation of the connection callback: now only called on failure.

@ethouris
Copy link
Collaborator Author

As per: Data race acessing m_iTsbPdDelay_ms

I think this shouldn't be a problem because makeMePeerOf is a function executed exclusively on the listener side, while groupConnect happens on the caller side. Groups also cannot involve both connecting and accepted sockets. This procedure of setting the latency is also executed exclusively on sockets that are newly being connected and borrow the value from a socket that is already connected.

@maxsharabayko maxsharabayko added this to the v1.4.3 milestone Nov 20, 2020
@maxsharabayko maxsharabayko merged commit 411264f into Haivision:master Nov 20, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] Recursive lock on an attempt to close a group
2 participants