-
Notifications
You must be signed in to change notification settings - Fork 866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] SIGSEGV while receiving broadcast group stream. #2133
Comments
Is it reproducible with v1.4.4.RC0 or v1.4.3? |
Like if another thread has changed the buffer state while this thread is still inside for (int i = m_iStartPos, n = m_iLastAckPos; i != n; i = shiftFwd(i))
{
if (m_pUnit[i] && m_pUnit[i]->m_iFlag == CUnit::GOOD)
{
HLOGC(brlog.Debug,
log << "getRcvReadyPacket: Found next packet seq=%" << m_pUnit[i]->m_Packet.getSeqNo() << " ("
<< nskipped << " empty cells skipped)");
return &m_pUnit[i]->m_Packet;
}
IF_HEAVY_LOGGING(++nskipped);
} |
It takes time but it reproduces quite easily. If you know have traces I can enable in libsrt to help: here what is ON:
|
test restarted with v1.4.4-rc.0 |
In between RC.0 and RC.1 there was PR #2094 merged touching receiver buffer access, but it was intended to improve the protection. Should not be the cause of this crash, but let's see. |
reproduced with 1.4.4-rc.0.
relaunching test with 1.4.3. |
If I can see correctly, it points to this condition: if (m_pUnit[i] && m_pUnit[i]->m_iFlag == CUnit::GOOD)
{
HLOGC(brlog.Debug,
log << "getRcvReadyPacket: Found next packet seq=%" << m_pUnit[i]->m_Packet.getSeqNo() << " ("
<< nskipped << " empty cells skipped)");
return &m_pUnit[i]->m_Packet;
} if |
BTW. @jeandube any chances to run repro with thread sanitizer? |
Right. Same thing in Jean's post above:
It looks like the |
|
@ethouris I am currently running the test with 1.4.3 with no luck up to now. If I retrieve the recipe to build for thread sanitize I'll do it on 1.4.4-rc.0. regarding m_iSize, I printed the whole 'this' to get everything. |
Can you post the heavy log? |
|
@gou4shi1, thanks for your observations. I had this issue reproduced with 1.4.4-rc.1 with -fsanitizer=thread. |
Here the output before the crash:
|
I can see that the call to This buffer has been originally developed in UDT as lock-free as it didn't need locking as there were only two threads operating on particular end - reader worker thread at the new end, application thread at the old end, outdated values of non-atomic positioners could at worst point to unused space. The problem is that in SRT there's a new thread modifying |
It is clear that TSBPD thread has been implemented without a clear understanding of the UDT threading model and intrinsic protection that comes with it. Who other than me can say this :-) Now it's done. The TSBPD code should probably be run in the SRT:RcvQ thread, removing the need of many mutexes we keep adding to work around this design flaw. |
I was thinking about it, but the problem here is that its role is to trigger appropriate flags at the strictly specified time and doing it in a thread that is also doing other things in the meantime may lose some accuracy. Might be that receiver-dropping shall be simply done differently, possibly by allowing the reader to step on empty units when packets are lacking and simply skip them when retrieveing the data. What is important, is that in this exactly place the problem is likely fixable by adding the same mutex that is used in the main thread's call. |
Got the crash again. Adding a dump of the socket:
|
Reproduced while receiving non-group stream. Problem seems to be having different threads pulling data (srt_rcvmsg2()) and pulling stats (srt_bstats()). srt_bstats() gets RcvBuffer size and timespan and can have the m_pUnit[X] it is handling get freed by the data pulling thread, causing SIGSEGV on NULL pointer. RcvBuffer is not thread safe. From the public API it seems restricted to srt_bstats() and the like vs. srt_recvXxx, but internally, many CRcvBuffer public methods have vulnerable code sections. If not fixed in the SRT lib this should at least be documented. Again the RcvBuffer state was not part of the stats in the original UDT design. |
|
I found that public API access to some CRcvBuffer vulnerable methods provided via the getOpt() API (SRTO_EVENT, SRTO_RCVDATA). Maybe this issue title should be revised to SIGSEGV if pulling stats while receiving stream. |
I'd say more - that was never part of the stats to retrieve this value and was introduced with the instantaneous stats. The receiver buffer wasn't mutex-protected according to the UDT design because it was stated that it's a queue that in worst case in a pull operation won't pull an element from the queue even if it is available, but will never ride over the end of queue. But this means that only defined operations are thread safe on it and moreover some of the single operations are mutex-protected. I think the best and safest way to fix it would be to have an extra field that keeps this value (similar as the |
As for vulnerabilities from within getOpt, I can see only two calls. One is under a lock (that's even a cascade of about 3 locks) and one is only calculating number out of integer fields. That first one might require more analysis, but that's actually part of the original UDT (just slightly refactored in SRT). The second one may at worst return a stupid value, that's all. |
@ethouris maintaining the RcvBuffer size while adding/pulling could probably be done at low cost and better than the apparently simpler add-another-mutex solution. Stats pulling is quite a low rate event (default 5 sec, min 1 sec in my case). |
Yeah, but that would be nontrivial. The average value is being calculated basing on the rare update events and it picks up the value from I think the least invasive method would be to have extra stats fields in the buffer that would be updated with every operation on the buffer. |
agree. |
Partially addressed in #2146 (v1.4.5-dev). if (m_pRcvBuffer)
{
perf->byteAvailRcvBuf = getAvailRcvBufferSizeLock() * m_config.iMSS;
if (instantaneous) // no need for historical API for Rcv side
{
perf->pktRcvBuf = m_pRcvBuffer->getRcvDataSize(perf->byteRcvBuf, perf->msRcvBuf);
}
else
{
perf->pktRcvBuf = m_pRcvBuffer->getRcvAvgDataSize(perf->byteRcvBuf, perf->msRcvBuf);
}
} Sadly, the same can be said about missing protection of the sender buffer. |
Yes, and adding a protection here would be then a risk for introducing deadlocks. The original UDT design for this has simply intended particular functionalities of the buffer to have particular thread affinity. This improves performance and simplifies the design (especially with regard to deadlocks), but it's a hell for improvements and extensions. |
@jeandube Please feel free to reopen if the crash would still be there. |
Grab PRs fixing issue Haivision#2133
Describe the bug
SIGSEGV in buffer.cpp while receiving stream over broadcast group. No network impairment added. Gdb session with log trace added.
To Reproduce
aarch64 test app caller/receiver runs on MX4D to receive MX4E listener/sender stream configured with 2 paths on 2 NICs.
mxphub70.D7log.txt
runs smoothly for ~100 min. receiving 1.5 Mbps CNN.
libsrt from branch fix-binding-any of my fork, based on 1.4.4.rc1 with fixes in api.cpp (one uncommitted fix to push).
The text was updated successfully, but these errors were encountered: