Message is lost in dlt-daemon. #631

tesaki · 2024-05-14T07:39:16Z

The message is lost if more than 65536 bytes of data accumulates in the kernel's receive buffer, when the dlt-daemon receives message data.
When the message type is APP_MSG, the dlt-daemon reads up to 65,535 bytes from the kernel's receive buffer. (ref. dlt_receiver_init_global_buffer() function)
At this time, the last message created from the read data is often incomplete. As a result, this incomplete message is detected as an error and discarded.

The following patch introduced a change to discard incomplete messages:
commit: dlt-daemon: Handle partial message parsing in receiver buffer

Reverting this patch resolves this issue.

To confirm this phenomenon, DaemonFIFOSize is set to 1MiB in dlt.conf and the following log sending program is used.

#include <stdio.h>
#include <stdlib.h>
#include <dlt.h>

DLT_DECLARE_CONTEXT(con_exa1);

int main()
{
    DLT_REGISTER_APP("TAPP", "Test App");

    DLT_REGISTER_CONTEXT(con_exa1, "TAPP", "Test Context");

    int i;
    for (i = 0; i < 5000; i++) {
        DLT_LOG(con_exa1, DLT_LOG_INFO, DLT_UINT32(i),
         DLT_STRING("TEST ################################################"));
    }

    DLT_UNREGISTER_CONTEXT(con_exa1);

    DLT_UNREGISTER_APP();
}

tesaki · 2025-02-27T05:59:24Z

I initially proposed increasing the dlt-daemon read size, but I realized this was incorrect. Therefore, I have updated the issue description accordingly.

Hello @minminlittleshrimp and @Suprathik-N
I would like to ask for some clarification regarding the patch "commit: dlt-daemon: Handle partial message parsing in receiver buffer":
This patch description states that partial messages may be received by the daemon and should be ignored. However, our investigation (*1) has indicated that this situation does not occur in a Linux environment.

Could you kindly provide details on the environment in which this patch was tested?
Additionally, could you share some background on how this patch was incorporated?

Based on our findings, this issue does not appear to occur in practice, leading me to believe that the patch may not be necessary.

*1: Investigation Findings
In a Linux environment, we confirmed that writev() and write() do not perform partial writes.

Case 1: Pipe (FIFO)
According to man 7 pipe, for non-blocking mode:
- If the data size ≤ PIPE_BUF, the entire message is written or an error occurs.
- If the data size > PIPE_BUF, a partial write may occur.
On Linux, PIPE_BUF = 4096 bytes. Additionally, the dlt-daemon library limits the message size to 1,424 bytes or less.
→ Since the message size is always within PIPE_BUF, partial messages will never be received when using a pipe.
Case 2: Unix Domain Socket
According to man 2 write, there is possibility to write only part of data. However, an inspection of the Linux kernel code reveals that write() either writes all bytes or fails entirely. The dlt-daemon uses a non-blocking, stream-type Unix domain socket, following this call chain:
writev() → ..(snip).. → net/unix/af_unix.c:unix_stream_sendmsg() → net/core/sock.c:sock_alloc_send_pskb()
In sock_alloc_send_pskb(), available space is checked, and memory pages are allocated. When allocating a page, if there is any free space, the implementation ensures the full message size is allocated, regardless of buffer capacity constraints. (Reference: Linux kernel v6.13.3 source)
This behavior has been in place since Linux kernel v4.0 (link).
→ Therefore, even when using a Unix domain socket on Linux, partial messages should never be received by dlt-daemon.

minminlittleshrimp · 2025-02-27T11:29:22Z

Hello @lti9hc @Bichdao021195
Kindly have a look.
Thanks

Suprathik-N · 2025-02-27T12:40:25Z

@tesaki @minminlittleshrimp The issue was observed on QNX, and with FIFO as the IPC mechanism.
I'm afraid we overlooked testing with UDS enabled, on Linux.

The error flow is as follows -

The PIPE_BUF of the FIFO is 4k bytes, and an App is generating messages with high frequency
The rate at which PIPE_BUF gets free is based on the read speed of the message data into the DLT-Daemon receive-buffer
Because the max message size in DLT is 1424 bytes, one can at best put 2 full messages + partial data of the 3rd message into PIPE_BUF
When this happens, the check between the message length & number of bytes written returned by writev() fails
The partial message is put back into the DLT-library ringbuffer, and the housekeeper thread resends it next time
But the partial data that is already inside the DLT-Daemon receive-buffer was not being handled

Here's a small snippet that you can add in line 166 of dlt_user_shared.c to reproduce the error scenario -

#if 1
    static int cnt = 0;
    cnt++;
    if (cnt % 5 == 0) {
        // shorten the data length to reproduce the scenario that all data could not be sent
        iov[2].iov_len = len3 / 2;

        // send partial message
        writev(handle, iov, 3);

        // force error state. Will be written into ring buffer and resent from housekeeper thread
        return DLT_RETURN_PIPE_FULL;
    }
#endif

tesaki · 2025-02-28T04:54:06Z

Hello @Suprathik-N

Thank you for your response. I see, the issue occurred on QNX.

However, based on the QNX documentation, it does not seem likely that partial messages would be sent.

According to the QNX documentation, the behavior of 'write()' is described as follows:
(Could this behavior vary depending on the QNX version?)

If the O_NONBLOCK flag is set, write requests are handled differently in the following ways:

The write() function does not block.

Write requests for PIPE_BUF bytes or less either succeed completely and return nbytes, or return -1 with errno set to EAGAIN.

Since the FIFO is set to O_NONBLOCK, and 1424 < PIPE_BUF = 4k, it appears to satisfy the above conditions, meaning that partial writes should not occur. As a result, the third message write should result in an error.

I think that unless the application is attempting to pack multiple messages to fully occupy the PIPE_BUF size, it seems unlikely that a message would be partially sent. How does the application send message?

minminlittleshrimp added the bug label Jun 8, 2024

minminlittleshrimp self-assigned this Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Message is lost in dlt-daemon. #631

Message is lost in dlt-daemon. #631

tesaki commented May 14, 2024 •

edited

Loading

tesaki commented Feb 27, 2025

minminlittleshrimp commented Feb 27, 2025

Suprathik-N commented Feb 27, 2025

tesaki commented Feb 28, 2025

Message is lost in dlt-daemon. #631

Message is lost in dlt-daemon. #631

Comments

tesaki commented May 14, 2024 • edited Loading

tesaki commented Feb 27, 2025

minminlittleshrimp commented Feb 27, 2025

Suprathik-N commented Feb 27, 2025

tesaki commented Feb 28, 2025

tesaki commented May 14, 2024 •

edited

Loading