Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix epoll event loss problem #1843

Merged
merged 4 commits into from
Mar 9, 2021
Merged

Conversation

cg82616424
Copy link
Contributor

image
more detailed informations:

  1. user call receive/receivemessage(about line number:6482)
  2. after read/receive, if rcvbuffer is empty, will set SRT_EPOLL_IN event to false
  3. but if we do not do some lock work here, will cause some sync problems between threads:
    (1) user thread: call receive/receivemessage
    (2) user thread: read data
    (3) user thread: no data in rcvbuffer, set SRT_EPOLL_IN event to false
    (4) receive thread: receive data and set SRT_EPOLL_IN to true
    (5) user thread: set SRT_EPOLL_IN to false
  4. so , m_RecvLock must be used here to protect epoll event

@cg82616424 cg82616424 changed the title fix epoll event loss problem fix epoll event loss problem Mar 5, 2021
@cg82616424
Copy link
Contributor Author

I viewed my code, I think continuous-integration/travis-ci/pr should be retry to solve this error, what should i do ?

@cg82616424 cg82616424 closed this Mar 5, 2021
@cg82616424 cg82616424 reopened this Mar 5, 2021
@cg82616424
Copy link
Contributor Author

I viewed my code, I think continuous-integration/travis-ci/pr should be retry to solve this error, what should i do ?

I retied , all successed

@ethouris
Copy link
Collaborator

ethouris commented Mar 5, 2021

Just one more thing before we start seeing to it: Did you read the LowLevelInfo.md document? It describes the already found and confirmed mutex ordering. Before you change anything in the mutex locking, make sure that the ordering there is respected, otherwise there's a risk of introducing a deadlock.

@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels Mar 5, 2021
@maxsharabayko maxsharabayko added this to the v1.4.3 milestone Mar 5, 2021
@cg82616424
Copy link
Contributor Author

Just one more thing before we start seeing to it: Did you read the LowLevelInfo.md document? It describes the already found and confirmed mutex ordering. Before you change anything in the mutex locking, make sure that the ordering there is respected, otherwise there's a risk of introducing a deadlock.

thanks for your attention, yep, I have read that document, I think I use right mutex to solve that problem

@cg82616424
Copy link
Contributor Author

when will this pull request be merged ?

@ethouris
Copy link
Collaborator

ethouris commented Mar 8, 2021

The review is scheduled, but we need to include it in the plan.

@ethouris
Copy link
Collaborator

ethouris commented Mar 8, 2021

Ok, there are two small problems here.

  1. Did you test this with SRTO_TSBPDMODE set to false? Because this code isn't going to be executed in TSBPD mode (which is default) at all - in this mode ACK action only informs TSBPD that there are potentially new packets to pickup and possibly sign off for reading when the time comes.
  2. Two lines above there's the following call:
                CSync::lock_signal(m_RecvDataCond, m_RecvLock);

which is also locking m_RecvLock; locking them separately would be rather a bad idea.

Instead you should rather use the CSync object and use its signal_locked method. Follow the recvdata_cc symbol to have an example how it is done.

Actually this thing in TSBPD mode is guarded correctly - the lock is applied for a wider range and embraces setting SRT_EPOLL_IN event.

@cg82616424
Copy link
Contributor Author

Ok, there are two small problems here.

  1. Did you test this with SRTO_TSBPDMODE set to false? Because this code isn't going to be executed in TSBPD mode (which is default) at all - in this mode ACK action only informs TSBPD that there are potentially new packets to pickup and possibly sign off for reading when the time comes.
    thanks for your suggestion; yes, I tested in async mode, use file mode and message api test for about 14 hours, no deadlock occured.
  2. Two lines above there's the following call:
                CSync::lock_signal(m_RecvDataCond, m_RecvLock);

which is also locking m_RecvLock; locking them separately would be rather a bad idea.
okay, I will review your soluation and update my pull request tomorrow

Instead you should rather use the CSync object and use its signal_locked method. Follow the recvdata_cc symbol to have an example how it is done.

Actually this thing in TSBPD mode is guarded correctly - the lock is applied for a wider range and embraces setting SRT_EPOLL_IN event.

@ethouris
Copy link
Collaborator

ethouris commented Mar 8, 2021

Ok, this is how you could do it:

{
    UniqueLock rdlock (m_RecvLock);
    CSync      rdcond (m_RecvDataCond, rdlock);

    if (m_config.bSynRecving)
    {
        // signal a waiting "recv" call if there is any data available
        rdcond.signal_locked(rdlock);
    }
    // acknowledge any waiting epolls to read
    // fix SRT_EPOLL_IN event loss but rcvbuffer still have data:
    // 1. user call receive/receivemessage(about line number:6482)
    // 2. after read/receive, if rcvbuffer is empty, will set SRT_EPOLL_IN event to false
    // 3. but if we do not do some lock work here, will cause some sync problems between threads:
    //      (1) user thread: call receive/receivemessage
    //      (2) user thread: read data
    //      (3) user thread: no data in rcvbuffer, set SRT_EPOLL_IN event to false
    //      (4) receive thread: receive data and set SRT_EPOLL_IN to true
    //      (5) user thread: set SRT_EPOLL_IN to false
    // 4. so , m_RecvLock must be used here to protect epoll event
    s_UDTUnited.m_EPoll.update_events(m_SocketID, m_sPollID, SRT_EPOLL_IN, true);
}

@cg82616424
Copy link
Contributor Author

Ok, this is how you could do it:

{
    UniqueLock rdlock (m_RecvLock);
    CSync      rdcond (m_RecvDataCond, rdlock);

    if (m_config.bSynRecving)
    {
        // signal a waiting "recv" call if there is any data available
        rdcond.signal_locked(rdlock);
    }
    // acknowledge any waiting epolls to read
    // fix SRT_EPOLL_IN event loss but rcvbuffer still have data:
    // 1. user call receive/receivemessage(about line number:6482)
    // 2. after read/receive, if rcvbuffer is empty, will set SRT_EPOLL_IN event to false
    // 3. but if we do not do some lock work here, will cause some sync problems between threads:
    //      (1) user thread: call receive/receivemessage
    //      (2) user thread: read data
    //      (3) user thread: no data in rcvbuffer, set SRT_EPOLL_IN event to false
    //      (4) receive thread: receive data and set SRT_EPOLL_IN to true
    //      (5) user thread: set SRT_EPOLL_IN to false
    // 4. so , m_RecvLock must be used here to protect epoll event
    s_UDTUnited.m_EPoll.update_events(m_SocketID, m_sPollID, SRT_EPOLL_IN, true);
}

thanks for your solution, I have modified my pull request. plz review it again

@ethouris ethouris requested a review from maxsharabayko March 9, 2021 08:02
@maxsharabayko maxsharabayko merged commit db3db78 into Haivision:master Mar 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants