Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix missing m_RcvBufferLock in processCtrlDropReq() #2223

Merged

Conversation

gou4shi1
Copy link
Contributor

@gou4shi1 gou4shi1 commented Jan 11, 2022

Try to fix this crash:

16:21:32.786189819 : STACKTRACE  10 >>> /home/xxx/remote_server/lib/libstdc++.so.6(+0xa5811) [0x7efcceccf811]
16:21:32.786190773 : 0x00000000000a5811: std::terminate() at ??:?
16:21:32.786191903 : STACKTRACE  11 >>> /home/xxx/remote_server/lib/libstdc++.so.6(+0xa5a65) [0x7efcceccfa65]
16:21:32.786193209 : 0x00000000000a5a65: __cxa_throw at ??:?
16:21:32.786194146 : STACKTRACE  12 >>> srt::CRcvBufferNew::getTimespan_ms() const(+0xc8) </home/xxx/remote_server/bin/remote_server> [0x55b19eb78508]
16:21:32.786195049 : 0x0000000000131508: srt::CRcvBufferNew::getTimespan_ms() const at /.../srt/srtcore/buffer_rcv.cpp:559
16:21:32.786197136 : STACKTRACE  13 >>> srt::CRcvBufferNew::updRcvAvgDataSize(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&)(+0x57) </home/xxx/remote_server/bin/remote_server> [0x55b19eb79667]
16:21:32.786198637 : 0x0000000000132667: srt::CRcvBufferNew::getRcvDataSize(int&, int&) const at /.../srt/srtcore/buffer_rcv.cpp:?
16:21:32.786199971 :  (inlined by) srt::CRcvBufferNew::updRcvAvgDataSize(std::chrono::time_point<std::chrono::_V2::steady_clock, std::chrono::duration<long, std::ratio<1l, 1000000000l> > > const&) at /.../srt/srtcore/buffer_rcv.cpp:1047
16:21:32.786200909 : STACKTRACE  14 >>> srt::CUDT::tsbpd(void*)(+0x17f) </home/xxx/remote_server/bin/remote_server> [0x55b19eb1ffcf]
16:21:32.786201792 : 0x00000000000d8fcf: srt::CUDT::tsbpd(void*) at /.../srt/srtcore/core.cpp:5172
16:21:32.786202577 : STACKTRACE  15 >>> /home/xxx/remote_server/lib/libstdc++.so.6(+0xd0a00) [0x7efccecfaa00]
16:21:32.786203389 : 0x00000000000d0a00: std::error_code::default_error_condition() const at ??:?
16:21:32.786204144 : STACKTRACE  16 >>> /home/xxx/remote_server/lib/libpthread.so.0(+0x76ba) [0x7efccf69d6ba]
16:21:32.786204833 : 0x00000000000076ba: start_thread at ??:?
16:21:32.786205599 : STACKTRACE  17 >>> clone(+0x6d) </lib/x86_64-linux-gnu/libc.so.6> [0x7efcce5464dd]
16:21:32.786206431 : 0x00000000001074dd: clone at /build/glibc-e6zv40/glibc-2.23/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:111


The last logs before crash:

16:21:32.007709/SRT:TsbPd!W:SRT.br: @708054504:RCV-DROPPED 15 packet(s). Packet seqno %349510186 delayed for 5711.172 ms
16:21:32.008328/SRT:TsbPd!W:SRT.br: @708054545:RCV-DROPPED 1 packet(s). Packet seqno %256173354 delayed for 7106.567 ms
16:21:32.010957/SRT:TsbPd!W:SRT.br: @708054541:RCV-DROPPED 1 packet(s). Packet seqno %285142564 delayed for 5765.357 ms
16:21:32.012821/SRT:RcvQ:w1!W:SRT.br: CRcvBufferNew.dropMessage(): nothing to drop. Requested [1330757209; 1330757210]. Buffer start 1330757821.
16:21:32.012917/SRT:TsbPd!W:SRT.br: @708054505:RCV-DROPPED 20 packet(s). Packet seqno %1960038793 delayed for 5559.037 ms
16:21:32.013619/SRT:TsbPd!W:SRT.br: @708054513:RCV-DROPPED 57 packet(s). Packet seqno %1330757880 delayed for 5446.461 ms
16:21:32.048338/SRT:TsbPd!W:SRT.br: @708054545:RCV-DROPPED 35 packet(s). Packet seqno %256173483 delayed for 6246.725 ms
16:21:32.052688/SRT:TsbPd!W:SRT.br: @708054504:RCV-DROPPED 1 packet(s). Packet seqno %349510191 delayed for 5658.507 ms
16:21:32.053396/SRT:TsbPd!W:SRT.br: @708054505:RCV-DROPPED 10 packet(s). Packet seqno %1960038808 delayed for 5349.968 ms
16:21:32.092623/SRT:RcvQ:w2.N:SRT.cn: PASSING request from: 39.144.18.154:27152 to agent:708054550
16:21:32.092686/SRT:RcvQ:w2 D:SRT.sm: generateSocketID: : @708049189
16:21:32.092822/SRT:RcvQ:w2.N:SRT.cn: HSREQ/rcv: cmd=1(HSREQ) len=12 vers=0x10404 opts=0xb4 delay=0
16:21:32.092913/SRT:RcvQ:w2.N:SRT.cn: listen ret: -1 - conclusion
16:21:32.092927/SRT:RcvQ:w2.N:SRT.cn: Listener managed the connection request from: 39.144.18.154:27152 result:waveahand
16:21:32.126500/SRT:RcvQ:w1!W:SRT.br: CRcvBufferNew.dropMessage(): nothing to drop. Requested [256172271; 256172276]. Buffer start 256173486.
16:21:32.155599/SRT:RcvQ:w2.N:SRT.cn: PASSING request from: 39.144.18.154:27153 to agent:708054550
16:21:32.155649/SRT:RcvQ:w2.N:SRT.cn: Listener managed the connection request from: 39.144.18.154:27153 result:waveahand

So I believe that it's caused by two threads calling dropMessage() and updRcvAvgDataSize() at the same time.

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Jan 11, 2022

I also found that many other places also only lock m_RecvLock before calling m_pRcvBuffer->xxx(), e.g.

void srt::CUDT::updateSrtRcvSettings()
{
    if (m_bTsbPd || m_bGroupTsbPd)
    {
        enterCS(m_RecvLock);
#if ENABLE_NEW_RCVBUFFER
        m_pRcvBuffer->setTsbPdMode(m_tsRcvPeerStartTime, false, milliseconds_from(m_iTsbPdDelay_ms));
#else
        m_pRcvBuffer->setRcvTsbPdMode(m_tsRcvPeerStartTime, milliseconds_from(m_iTsbPdDelay_ms));
#endif
        leaveCS(m_RecvLock);

Is these by design?

@gou4shi1
Copy link
Contributor Author

gou4shi1 commented Jan 12, 2022

It still crashed even with this patch, and I found the real cause with gdb:

(gdb) p *this
$3 = {m_entries = {m_strIndexErr = 0x556cb9947c8f "FixedArray: invalid index", m_size = 8192, m_entries = 0x7ff264323150}, 
  m_szSize = 8192, m_pUnitQueue = 0x556cba82b938, m_iStartSeqNo = 412067428, m_iStartPos = 8174, m_iFirstNonreadPos = 8174, 
  m_iMaxPosInc = 21, m_iNotch = 0, m_numOutOfOrderPackets = 0, m_iFirstReadableOutOfOrder = -1, m_bPeerRexmitFlag = false, 
  m_bMessageAPI = true, m_tsbpd = {m_iFirstRTT = -1, m_bTsbPdMode = true, m_tdTsbPdDelay = {__r = 120000000}, m_tsTsbPdTimeBase = {__d = {
        __r = 62365575042492}}, m_bTsbPdWrapCheck = false, static TSBPD_WRAP_PERIOD = 60000000, static TSBPD_DRIFT_MAX_VALUE = 5000, 
    static TSBPD_DRIFT_MAX_SAMPLES = 1000, m_DriftTracer = {m_qDrift = 0, m_qOverdrift = 0, m_qDriftSum = 0, m_uDriftSpan = 0}, 
    m_mtxRW = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
            __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>}}, 
  m_mavg = {m_tsLastSamplingTime = {__d = {__r = 63360537196029}}, m_dBytesCountMAvg = 1851179.4968897442, 
    m_dCountMAvg = 1460.0026572110521, m_dTimespanMAvg = 0.99008074605184671}, m_BytesCountLock = {<std::__mutex_base> = {_M_mutex = {
        __data = {__lock = 1, __count = 0, __owner = 156, __nusers = 1, __kind = 0, __spins = 0, __elision = 0, __list = {__prev = 0x0, 
            __next = 0x0}}, __size = "\001\000\000\000\000\000\000\000\234\000\000\000\001", '\000' <repeats 26 times>, 
        __align = 1}}, <No data fields>}, m_iBytesCount = 1851176, m_iPktsCount = 1460, m_uAvgPayloadSz = 1217}

It should be fixed by #2224 and #2225

@gou4shi1 gou4shi1 changed the title [core] fixed missing m_RcvBufferLock in processCtrlDropReq() [core] fix missing m_RcvBufferLock in processCtrlDropReq() Jan 12, 2022
@maxsharabayko maxsharabayko merged commit 8518558 into Haivision:master Jan 18, 2022
@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels Jan 18, 2022
@maxsharabayko maxsharabayko added this to the v1.4.5 milestone Jan 18, 2022
@gou4shi1 gou4shi1 deleted the fix-missing-m_RcvBufferLock branch January 18, 2022 04:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants