Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[core] fix TTL drop packets #2003

Merged
merged 4 commits into from
May 18, 2021

Conversation

gou4shi1
Copy link
Contributor

@gou4shi1 gou4shi1 commented May 16, 2021

The range pair of UMSG_DROPREQ should be [low, high] instead of [low, high), right?
The previous code always make m_iSndCurrSeqNo bigger than the largest msg sent, then cause other problems.

@ethouris
Copy link
Collaborator

ethouris commented May 17, 2021

Ok, I made a short review of the code and also looked into the original UDT code since when it remains actually untouched, and you are likely right. The CSndBuffer::readData call is starting with msglen set to 1 when found the first record, then it continues until the msgno value stays the same up to the first different msgno, which means that it represents the number of packets in the message. Also, removal from the receiver loss list (on the side that receives UMSG_DROPREQ) it removes the loss INCLUDING the upper range sequence. Looxlike then it didn't ever work properly and no one detected it for such a long time.

But then - just to be sure - would you be able to make a test where it happens (even with your fixed version) with added logs that would display:

  • inside CSndBuffer::readData, up to which packet it has found out that the message is the same, and display the lowest and highest sequence number in the range together with the msglen
  • after the call to CSndBuffer::readData, what's the range of sequence numbers that it is going to pack into the UMSG_DROPREQ message

The logs are there already actually, the need is only to add data being displayed in them. In this, for example:

        HLOGC(qslog.Debug,
              log << "CSndBuffer::readData: due to TTL exceeded, " << w_msglen << " messages to drop, up to " << msgno);

it's just about adding the first found sequence (need to store in a temporary variable) and the last sequence (p->m_iSeqNo).

@gou4shi1
Copy link
Contributor Author

inside CSndBuffer::readData

added

after the call to CSndBuffer::readData

already have this?

            HLOGC(qrlog.Debug, log << "IPE: loss-reported packets not found in SndBuf - requesting DROP: "
                    << "msg=" << MSGNO_SEQ::unwrap(w_packet.m_iMsgNo) << " SEQ:"
                    << seqpair[0] << " - " << seqpair[1] << "(" << (-offset) << " packets)");

@gou4shi1
Copy link
Contributor Author

Should I run a test with the fixed version and then post the heavy log to you?
I have the heavy log in previous version, do you want to take a look?

@ethouris
Copy link
Collaborator

If you have both, then that's even better. About this second log - you might also add the msglen value to it, just to see one and the other. Note that in order to see only desired logs, you can limit the FAs for the logs (for the SRT applications it will be -logfa ~all que-send in this case)

@ethouris
Copy link
Collaborator

Ah, or course, placing them in the log in this PR is ok, but I rather meant to see the results from the running session with these logs, just to be able to see the numbers.

@gou4shi1
Copy link
Contributor Author

https://1drv.ms/u/s!AuaosNlJ5ELxjy0Ld3-GIPtBtipI?e=Eai1pa
Here is the heavy log of previous version, it finally caused CSndBuffer::getMsgNoAt: IPE: offset=0 not found, max offset=-382

@ethouris
Copy link
Collaborator

I think something's wrong with these logs. First, the form of the log:

/SRT:SndQ:w4 D:SRT.qs: CSndBuffer::readData: due to TTL exceeded, 2 messages to drop, up to 3
/SRT:SndQ:w4 D:SRT.qr: IPE: loss-reported packets not found in SndBuf - requesting DROP: msg=3 SEQ:159422586 - 159422588(0 packets)

doesn't match what I can see in the sources:

        HLOGC(qslog.Debug,
              log << "CSndBuffer::readData: due to TTL exceeded, SEQ " << first_seq << " - " << first_seq + w_msglen - 1
                  << " to drop, msgno=" << msgno);

Moreover, first_seq + w_msglen - 1 gives me nothing. I need p->m_iSeqNo exactly in order to prove that the distance that this function measured as w_msglen matches the message length measured as a distance between sequence numbers in the packets. Of course, w_msglen as well.

@gou4shi1
Copy link
Contributor Author

Here is the heavy log of previous version

It's the log without this pr.

@ethouris
Copy link
Collaborator

Yes, but this log doesn't contain any information I need to recognize how exactly it works. I need the log with the extra information, can be also with the fix.

@maxsharabayko maxsharabayko added this to the v1.4.4 milestone May 17, 2021
@maxsharabayko maxsharabayko added [core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior labels May 17, 2021
@gou4shi1
Copy link
Contributor Author

https://1drv.ms/u/s!AuaosNlJ5ELxjy6drMsHmyUob7h9?e=j9TEFe
Here is the log with this pr.

@maxsharabayko
Copy link
Collaborator

Indeed the range of sequences in the sender's drop request was wrong. It didn't result in an obvious issue because the receiver mostly ignores the range of sequences, and drops the packet from its buffer by the message number.

Which is another issue: message number is available if the sender is dropping packets by TLL, but not available if retransmission of packets already dropped by the receiver is requested. In the latter case, the sender knows the range of sequences but does not know for sure the message number of missing packets. Added a note in #1910.

@ethouris
Copy link
Collaborator

I don't understand why it matters. DROPREQ is being sent anyway with the sequence number. It simply informs the receiver that it cannot request packets earlier than this. And the message can be reconstructed at the receiver side only in case when it has all packets in order since the first until the last message ([PB_FIRST] [PB_SUBSEQUENT] .... [PB_SUBSEQUENT] [PB_LAST] or [PB_SOLO] as a single packet). If the message cannot be reconstructed according to these rules, it should be wholly dropped at the receiver side.

Sender doesn't need to know the message numbers, receiver does. Sender may need the numbers to identify what packets belong to a message, but only as long as they are still in the buffer. Note that it is also predicted that ACK cannot mark the position "in half of the message".

@maxsharabayko maxsharabayko merged commit d9150ea into Haivision:master May 18, 2021
@maxsharabayko
Copy link
Collaborator

Two more issues remaining:

  1. processCtrlLossReport(..) does not send a drop request if one packet is reported lost (not a range). See this line of code.
  2. CUDT::packLostData(..)
        const int offset = CSeqNo::seqoff(m_iSndLastDataAck, w_packet.m_iSeqNo);
        if (offset < 0)
        {
            // (...)
            seqpair[1] = m_iSndLastDataAck; // this is the first sequence present in the beffer, should be deseq.

@ethouris
Copy link
Collaborator

I can see a kinda different problem. This line shows sending DROPREQ with sequence numbers, but msgno=0 (as it's unknown). But CRcvBuffer::dropMsg function gets this message number as a good deal and tries to drop messages with that number, which will obviously fail. Looxlike this could be a fix for some problem that was not finished.

@maxsharabayko
Copy link
Collaborator

I can see a kinda different problem. This line shows sending DROPREQ with sequence numbers, but msgno=0 (as it's unknown).

Exactly what is written in the comment above:

Which is another issue: message number is available if the sender is dropping packets by TLL, but not available if retransmission of packets already dropped by the receiver is requested.

@ethouris
Copy link
Collaborator

What I wanted to point out is that the problem isn't with unavailability of the message number, but with the wrong interpretation of this message.

@gou4shi1 gou4shi1 deleted the fix_ttl_drop_packets branch May 18, 2021 16:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[core] Area: Changes in SRT library core Type: Bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants