Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] Added missing lock for isRcvDataReady() #2094

Merged

Conversation

gou4shi1
Copy link
Contributor

Fix crash:

STACKTRACE   3 >>> /home/xxx/remote_server/lib/libpthread.so.0(+0x11390) [0x7fa906ed1390]
0x0000000000011390: __restore_rt at ??:?

STACKTRACE   4 >>> CRcvBuffer::getRcvReadyPacket(int)(+0x54b) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa90783820b]
0x000000000005120b: CRcvBuffer::getRcvReadyPacket(int) at ??:?

STACKTRACE   5 >>> CRcvBuffer::isRcvDataReady(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&, int&, int)(+0x82) </home/xxx/remote_server/
lib/libsrt.so.1.4> [0x7fa907838692]
0x0000000000051692: CRcvBuffer::isRcvDataReady(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > >&, int&, int) at ??:?

STACKTRACE   6 >>> CRcvBuffer::isRcvDataReady()(+0x31) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa9078389f1]
0x00000000000519f1: CRcvBuffer::isRcvDataReady() at ??:?

STACKTRACE   7 >>> srt::CUDTGroup::recv_WaitForReadReady(std::vector<srt::CUDTSocket*, std::allocator<srt::CUDTSocket*> > const&, std::set<srt::CUDTSocket*, std::less<srt::CUDTSocket*>, std::allocator<srt::CUDTS
ocket*> >&)(+0x3bc) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa9079154dc]
0x000000000012e4dc: srt::CUDTGroup::recv_WaitForReadReady(std::vector<srt::CUDTSocket*, std::allocator<srt::CUDTSocket*> > const&, std::set<srt::CUDTSocket*, std::less<srt::CUDTSocket*>, std::allocator<srt::CUDT
Socket*> >&) at ??:?

STACKTRACE   8 >>> srt::CUDTGroup::recv(char*, int, SRT_MsgCtrl_&)(+0x1fa) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa907915cea]
0x000000000012ecea: srt::CUDTGroup::recv(char*, int, SRT_MsgCtrl_&) at ??:?

STACKTRACE   9 >>> srt::CUDT::recvmsg2(int, char*, int, SRT_MsgCtrl_&)(+0x57) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa907816997]
0x000000000002f997: srt::CUDT::recvmsg2(int, char*, int, SRT_MsgCtrl_&) at ??:?

STACKTRACE  10 >>> srt_recvmsg2(+0x1e) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa9078f959e]
0x000000000011259e: srt_recvmsg2 at ??:?

@codecov-commenter

This comment has been minimized.

@gou4shi1 gou4shi1 force-pushed the add-missing-lock-for-isRcvDataReady branch from 2be83e2 to bd11051 Compare August 21, 2021 09:43
@gou4shi1
Copy link
Contributor Author

Is it safe to lock m_RecvLock while holding m_GlobControlLock?

@ethouris
Copy link
Collaborator

ethouris commented Aug 23, 2021

According to this document, m_RecvLock orders after m_GlobControlLock. So yes, it is safe.

@maxsharabayko
Copy link
Collaborator

Hi @gou4shi1,
If I understand correctly, the crash happens because an individual socket (member of a group) has not yet created the receiver buffer, right?
If so, I wonder how does the m_RecvLock help? Shouldn't a check for m_pRcvBuffer != NULL be used there additionally?

Also, the RCV buffer should be protected by m_RcvBufferLock, not m_RecvLock. Unless there is some other intent here.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Aug 23, 2021

Also, the RCV buffer should be protected by m_RcvBufferLock, not m_RecvLock.

I use m_RecvLock since I noticed that other usages of isRcvDataReady() use m_RecvLock

If I understand correctly, the crash happens because an individual socket (member of a group) has not yet created the receiver buffer, right?

Not sure, from the stacktrace, it crash at getRcvReadyPacket()

Signal  11 received: Segmentation fault - on PID 81, TID 160
Fault address: 0x0

CPU REGISTERS:
RIP=0x00007fa90783820b RFLAGS=0x0000000000010202 TRAPNO=0x000000000000000e CR2=0x0000000000000000
RAX=0x0000000000000000 RBX=0x0000000000000000 RCX=0x00007fa88c000000 RDX=0x0000000000000000
RSI=0x00007fa9079236d0 RDI=0x00007fa8817c6b78 RBP=0x00007fa8a00d6140 RSP=0x00007fa8817c6b50
 R8=0x00007fa88c03e850  R9=0x0000000000000000 R10=0x0000000000000000 R11=0x0000000000000246
R12=0x00000000000014a8 R13=0x00007fa8817c6b60 R14=0x00007fa8817c6b70 R15=0x00007fa8817c7018

Shouldn't a check for m_pRcvBuffer != NULL be used there additionally?

I will have a try.

@maxsharabayko
Copy link
Collaborator

STACKTRACE   6 >>> CRcvBuffer::isRcvDataReady()(+0x31) </home/xxx/remote_server/lib/libsrt.so.1.4> [0x7fa9078389f1]
0x00000000000519f1: CRcvBuffer::isRcvDataReady() at ??:?

I interpreted this so as the RCV buffer has not been created yet. The address is garbage (at ??:?).

@gou4shi1
Copy link
Contributor Author

I interpreted this so as the RCV buffer has not been created yet. The address is garbage (at ??:?).

Great!

I will have a try.

I have tried m_pRcvBuffer != NULL, it seems to work!

How to reproduce: listener enable heavy log, caller broadcast video in a bad network.

@gou4shi1 gou4shi1 force-pushed the add-missing-lock-for-isRcvDataReady branch from bd11051 to afeda31 Compare August 23, 2021 10:13
@gou4shi1 gou4shi1 changed the title [core] Added missing lock for isRcvDataReady() [core] check if m_pRcvBuffer is nullptr in recv_WaitForReadReady() Aug 23, 2021
@maxsharabayko maxsharabayko added this to the v1.4.4 milestone Aug 23, 2021
@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels Aug 23, 2021
@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Aug 23, 2021

I have tried m_pRcvBuffer != NULL, it seems to work!

Oops...it crashed just now...
Maybe it's still due to lock missing...

@maxsharabayko maxsharabayko mentioned this pull request Aug 23, 2021
9 tasks
@maxsharabayko
Copy link
Collaborator

The CRcvBuffer is created in CUDT::prepareConnectionObjects(..). And it is not protected by relevant mutexes, except for m_ConnectionLock. Checking if the buffer is created or not yet without a lock is not very safe, although given the transition from NULL -> !NULL it is ok-ish.

Ideally, both getting the pointer to the CRcvBuffer and checking its state using isRcvDataReady should happen under the m_RcvBufferLock. Doing so in this PR would require intense testing and would block its inclusion in the v1.4.4 release.
Added a TODO item in #1674.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Aug 23, 2021

@maxsharabayko Hi, I think it's not caused by m_pRcvBuffer = NULL, I just logged m_pRcvBuffer before calling isRcvDataReady():

21:56:23.502119/SrtL:Recv:LIVE D:SRT.gr: group/recv: m_pRcvBuffer=0x7f51741b6410

Here is the log https://1drv.ms/u/s!AuaosNlJ5ELxjzD_iirN2ZJVwvZv?e=xSZNdl
You can see that the last log of thread Recv:LIVE is m_pRcvBuffer=0x7f51741b6410
I will add more log inside isRcvDataReady() tomorrow.

@gou4shi1
Copy link
Contributor Author

I will add more log inside isRcvDataReady() tomorrow.

@@ -1689,11 +1689,13 @@ CPacket* CRcvBuffer::getRcvReadyPacket(int32_t seqdistance)
     }
     IF_HEAVY_LOGGING(int nskipped = 0);
+    HLOGC(brlog.Debug, log << "getRcvReadyPacket: m_iStartPos=" <<  m_iStartPos << " m_iLastAckPos=" << m_iLastAckPos);
     for (int i = m_iStartPos, n = m_iLastAckPos; i != n; i = shiftFwd(i))
     {
         /*
          * Skip missing packets that did not arrive in time.
          */
+        HLOGC(brlog.Debug, log << "getRcvReadyPacket: i=" <<  i << " m_pUnit[i]=" << m_pUnit[i]);
         if (m_pUnit[i] && m_pUnit[i]->m_iFlag == CUnit::GOOD)

The last several logs before crash are:

11:38:06.530865/SrtL:Recv:LIVE D:SRT.br: getRcvReadyPacket: m_iStartPos=5021 m_iLastAckPos=5029
11:38:06.530884/SRT:TsbPd:@49 D:SRT.br: getRcvReadyMsg: POS=5021 FREED: smaller than base_seq
11:38:06.530909/SrtL:Recv:LIVE D:SRT.br: getRcvReadyPacket: i=5021 m_pUnit[i]=0x7fab90670668

So this should be a data race between getRcvReadyPacket() and getRcvReadyMsg()->freeUnitAt()

@gou4shi1 gou4shi1 force-pushed the add-missing-lock-for-isRcvDataReady branch from afeda31 to bf423ea Compare August 24, 2021 04:03
@gou4shi1 gou4shi1 changed the title [core] check if m_pRcvBuffer is nullptr in recv_WaitForReadReady() [core] Added missing lock for isRcvDataReady() Aug 24, 2021
@maxsharabayko
Copy link
Collaborator

@gou4shi1

So this should be a data race between getRcvReadyPacket() and getRcvReadyMsg()->freeUnitAt()

Great analysis! Thank you!
Then m_RcvBufferLock (as in the current version of this PR) should help.
Waiting for a confirmation that you no longer get the crash.

@gou4shi1
Copy link
Contributor Author

Waiting for a confirmation that you no longer get the crash.

Yes! No longer crash for two days. (Without this PR, it will crash within 30min if heavy log is enabled.)

@maxsharabayko maxsharabayko merged commit cbdd676 into Haivision:master Aug 25, 2021
@gou4shi1 gou4shi1 deleted the add-missing-lock-for-isRcvDataReady branch August 25, 2021 09:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants