Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Access to RTCP member variable participants_ is not thread safe, leading to a crash #172

Closed
wowaser opened this issue Oct 25, 2022 · 10 comments
Labels
critical Crash, quality issue or another major bug

Comments

@wowaser
Copy link
Contributor

wowaser commented Oct 25, 2022

Hello,

I've been trying to debug this for a while now, but can't quite get it. All I can do is share the clues that I have gathered.
The problem is with the compound packet. Shortly before the heap corruption, this function gets weird values, especially header.length gets a ridiculous value like 80000 or something.

void uvgrtp::rtcp::read_rtcp_header(const uint8_t* buffer, size_t& read_ptr, uvgrtp::frame::rtcp_header& header)
{
    header.version = (buffer[read_ptr] >> 6) & 0x3;
    header.padding = (buffer[read_ptr] >> 5) & 0x1;

    header.pkt_type = buffer[read_ptr + 1] & 0xff;

    if (header.pkt_type == uvgrtp::frame::RTCP_FT_APP)
    {
        header.pkt_subtype = buffer[read_ptr] & 0x1f;
    }
    else {
        header.count = buffer[read_ptr] & 0x1f;
    }

    header.length = ntohs(*(uint16_t*)& buffer[read_ptr + 2]);

    read_ptr += RTCP_HEADER_SIZE;
}

After that a random set of errors occurs (version is constant 2, so it being wrong points at memory problems):
image

Shortly after the whole program crashes.

The problem might be with sent bytes, because I encountered this once (running on localhost):
image

@wowaser
Copy link
Contributor Author

wowaser commented Oct 25, 2022

I utilized Windbg and ApplicationVerifier for debugging. Quite often it pointed at generate_report function, specifically the construct_sdes_chunk.

After further investigation, I found out that there is an overflow in the generate_report function. This does not happen always, but after the construct_sdes_chunk call, the value of write_ptr is greater than compound_packet_size.

I put printouts in the generate_report function. They are pretty self explanatory. The last one is inside construct_sdes_chunk, right before memcpy.
image

The program crashes on that memcpy line with Windbg and ApplicationVerifier. Otherwise it can run for hours before crashing into a heap corruption.

@jrsnen jrsnen added the bug Something isn't working label Oct 26, 2022
@jrsnen
Copy link
Member

jrsnen commented Oct 26, 2022

Hi.

I'm sorry to hear that you have encountered a bug. Are you setting the SDES items by calling send_sdes_packet()-function? From your description, the problem seems to be that new SDES items appear between the time the compound_packet_size is calculated and the packet itself is constructed. There is a mutex called packet_mutex_ which should prevent all modifications to packet data while they are being sent, so I'm unable to figure out the cause for this error at this time.

BR, Joni

@wowaser
Copy link
Contributor Author

wowaser commented Oct 26, 2022

No, I am not calling send_sdes_packet()

@jrsnen
Copy link
Member

jrsnen commented Nov 1, 2022

Are you recreating or restarting the stream? I was just wondering if there is something wrong with the stopping of the RTCP, which could explain this behavior.

BR, Joni

@wowaser
Copy link
Contributor Author

wowaser commented Nov 7, 2022

I am not sure, I have observed a crash after running a test for a day, and after running for a minute. I cannot say whether stream recreation has something to do with it

@wowaser
Copy link
Contributor Author

wowaser commented Dec 18, 2022

@jrsnen I have found the problem. I don't know how to fix it, hopefully you have some ideas
In uvgrtp::rtcp::generate_report() on line 1612 we calculate the reports var:

    uint8_t reports = 0;
    for (auto& p : participants_)
    {
        if (p.second->stats.received_rtp_packet)
        {
            ++reports;
        }
    }

a little later it is used to allocate memory for packet array:

uint32_t compound_packet_size = size_of_compound_packet(reports, sr_packet, rr_packet, sdes_packet, app_packets_size, bye_packet);

Later in this function, when the memory is already allocated, we check participant stats again to populate the array:

    for (auto& p : participants_)
    {
        // only add report blocks if we have received data from them
        if (p.second->stats.received_rtp_packet)
        {
            uint32_t dropped_packets = p.second->stats.dropped_pkts;

            // TODO: Fraction should be the number of packets lost compared to number of packets expected (see fraction lost in RFC 3550)
            // see https://datatracker.ietf.org/doc/html/rfc3550#appendix-A.3
            
            //uint8_t fraction = dropped_packets ? p.second->stats.received_bytes / dropped_packets : 0;
            uint8_t fraction = 0; // disabled, because it was incorrect

            uint64_t diff = (u_long)uvgrtp::clock::hrc::diff_now(p.second->stats.sr_ts);
            uint32_t dlrs = (uint32_t)uvgrtp::clock::ms_to_jiffies(diff);

            /* calculate delay of last SR only if SR has been received at least once */
            if (p.second->stats.lsr == 0)
            {
                dlrs = 0;
            }

            construct_report_block(frame, write_ptr, p.first, fraction, dropped_packets,
                p.second->stats.cycles, p.second->stats.max_seq, (uint32_t)p.second->stats.jitter, 
                p.second->stats.lsr, dlrs);

            // we only send reports if there is something to report since last report
            p.second->stats.received_rtp_packet = false;
        }
    }

The problem is that the stats of participants can be modified from another thread. So, when allocating the array, received_rtp_packet is false, but it can happen so that when we are populating the array received_rtp_packet becomes true, and the array overflows.

@jrsnen
Copy link
Member

jrsnen commented Dec 19, 2022

@wowaser thanks, based on your description I think I know what is the problem.

The problem seems to be that while modifying the SDES structures is guarded by a mutex, adding participants is not guarded by any mutexes and that leads to this crash. This problem seems to exist also in other functions in RTCP class.

This crash and other possible problems would in my option be best solved by adding a new mutex and locking it around every usage of participants_-variable and in generate_report()-function the locked section should be around both uses so that no modification can take place in between.

BR, Joni

@jrsnen jrsnen changed the title Heap corruption inside RTCP Access to RTCP participants_ member variable is not thread safe, leading to a crash Dec 19, 2022
@jrsnen jrsnen changed the title Access to RTCP participants_ member variable is not thread safe, leading to a crash Access to RTCP member variable participants_ is not thread safe, leading to a crash Dec 19, 2022
@jrsnen jrsnen added critical Crash, quality issue or another major bug and removed bug Something isn't working labels Dec 19, 2022
@wowaser
Copy link
Contributor Author

wowaser commented Jan 9, 2023

@jrsnen Should I add a pull request where I place a couple mutexes around, or do you have more major changes in mind?
For example removing the error-prone array initialization in favor of a dynamic data structure (like a vector) and fetching the size after it's populated?

@jrsnen
Copy link
Member

jrsnen commented Jan 10, 2023

@wowaser A PR would be great! I guess the smart way to do this is to the reverse of what reception does using the same structures, but this does not seem worth the effort at this point.

@jrsnen
Copy link
Member

jrsnen commented Feb 6, 2023

Fixed by #188

@jrsnen jrsnen closed this as completed Feb 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
critical Crash, quality issue or another major bug
Projects
None yet
Development

No branches or pull requests

2 participants