Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Posix socket OS_QueueGet() timeout fails #189

Closed
skliper opened this issue Sep 30, 2019 · 7 comments
Closed

Posix socket OS_QueueGet() timeout fails #189

skliper opened this issue Sep 30, 2019 · 7 comments

Comments

@skliper
Copy link
Contributor

skliper commented Sep 30, 2019

We have Linux platforms where the Linux mqueue is not available and we have to use sockets. However, we're seeing a problem when using sockets vs. mqueues. When OS_QueueGet() is called with an actual timeout value (msec) the socket implementation appears to always return without properly reporting a message is present. SB messages pile up.

The outreach drone is one such platform and we see this in our CentOS Linux VM simulation platform (OSAL configured the same for consistency). (I have a unofficial report from another developer who encountered this as well.) I think I've seen this on is with the open-source release OSAL as well as the current development branch.

(Likely the default event filters prevented folks from seeing this if they weren't looking at the SB telemetry directly.)


Steps to reproduce:

  1. create a clean/pristine cFS build. (I used a bootstrap script: https://babelfish.arc.nasa.gov/trac/cfs_tools/ticket/35, but it shouldn't matter.) (I used all the development branches as of 4/26/2016 11:37 Central time.)
  2. source setvars.sh
  3. cd build/cpu1
  4. make config
  5. Remove all SB event filters (to see on command line), as in:
    {{{
    @@ -287,17 +287,17 @@
    ** This filtering applies only to SB events.
    ** These parameters have a lower limit of 0 and an upper limit of 65535.
    */
    -#define CFE_SB_FILTERED_EVENT1 CFE_SB_SEND_NO_SUBS_EID
    -#define CFE_SB_FILTER_MASK1 CFE_EVS_FIRST_4_STOP
    +#define CFE_SB_FILTERED_EVENT1 0
    +#define CFE_SB_FILTER_MASK1 CFE_EVS_NO_FILTER

-#define CFE_SB_FILTERED_EVENT2 CFE_SB_DUP_SUBSCRIP_EID
-#define CFE_SB_FILTER_MASK2 CFE_EVS_FIRST_4_STOP
+#define CFE_SB_FILTERED_EVENT2 0
+#define CFE_SB_FILTER_MASK2 CFE_EVS_NO_FILTER

-#define CFE_SB_FILTERED_EVENT3 CFE_SB_MSGID_LIM_ERR_EID
-#define CFE_SB_FILTER_MASK3 CFE_EVS_FIRST_16_STOP
+#define CFE_SB_FILTERED_EVENT3 0
+#define CFE_SB_FILTER_MASK3 CFE_EVS_NO_FILTER

-#define CFE_SB_FILTERED_EVENT4 CFE_SB_Q_FULL_ERR_EID
-#define CFE_SB_FILTER_MASK4 CFE_EVS_FIRST_16_STOP
+#define CFE_SB_FILTERED_EVENT4 0
+#define CFE_SB_FILTER_MASK4 CFE_EVS_NO_FILTER

#define CFE_SB_FILTERED_EVENT5 0
#define CFE_SB_FILTER_MASK5 CFE_EVS_NO_FILTER
}}}

  1. cd exe and run core-linux.bin & wait for 30-60 seconds
    As expected, running with mqueues by default, there will be no significant event messages after:
    {{{
    ES Startup: CFE_ES_Main entering OPERATIONAL state
    }}}

Now, to switch to sockets and show the problem:
7) Edit build/cpu1/inc/osconfig.h as:
{{{
@@ -132,7 +132,7 @@
** This define sets the queue implentation of the Linux port to use sockets
** commenting this out makes the Linux port use the POSIX message queues.
/
-/
#define OSAL_SOCKET_QUEUE */
+#define OSAL_SOCKET_QUEUE

/*
** Module loader/symbol table is optional
}}}
8) make clean;make
9) cd exe and run core-linux.bin & wait for 30-60 seconds, you'll see:
{{{
1980-012-14:03:20.26138 ES Startup: CFE_ES_Main entering OPERATIONAL state
Warning: System Log full, log entry discarded.
EVS Port1 66/1/CFE_TIME 21: Stop FLYWHEEL
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1808,pipe ES_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1808,pipe ES_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1808,pipe ES_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1885,pipe CI_LAB_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1883,pipe SAMPLE_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1808,pipe ES_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1885,pipe CI_LAB_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1883,pipe SAMPLE_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1808,pipe ES_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1885,pipe CI_LAB_CMD_PIPE,sender SCH_LAB_APP
EVS Port1 66/1/CFE_SB 17: Msg Limit Err,MsgId 0x1883,pipe SAMPLE_CMD_PIPE,sender SCH_LAB_APP
}}}

Each of these apps/services invokes CFE_SB_RcvMsg() with a timeout value:

  • cFE ES: 1000ms
  • CI_LAB & SAMPLE_APP: 500ms

They delegate to CFE_SB_ReadQueue() and to OS_QueueGet() with a timeout value.

@skliper skliper self-assigned this Sep 30, 2019
@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Imported from trac issue 166. Created by abrown4 on 2016-04-26T12:56:33, last modified: 2019-05-29T14:19:13

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by abrown4 on 2016-05-25 12:28:44:

Running the oscore-test unit test with mqueue vs. sockets (changed in osconfig.h) shows a failure:

'''diff ut_oscore_log_mqueue.txt ut_oscore_log_socket.txt'''
{{{
111,112c111,112
< ut_oscore PASSED 133 tests.
< ut_oscore FAILED 0 tests.

ut_oscore PASSED 132 tests.
ut_oscore FAILED 1 tests.
281c281
< #27 Queue-full [PASSED]


    #27 Queue-full [FAILED]

382c382
< TOTAL TEST CASES PASSED -> 133

TOTAL TEST CASES PASSED -> 132
465d464
< PASSED [ ] OS_QueuePut - #27 Queue-full
519c518
< TOTAL TEST CASES FAILED -> 0


TOTAL TEST CASES FAILED -> 1
520a520
FAILED [ ] OS_QueuePut - #27 Queue-full
}}}

Apparently the socket implementation doesn't implement queue depth checks with the same semantics. The OS_QueueCreate() queue_depth arg is ignored for sockets:
{{{
int32 OS_QueueCreate (uint32 *queue_id, const char *queue_name,
uint32 queue_depth,
uint32 data_size, uint32 flags)
}}}
The socket version of OS_QueuePut() looks at the bytes sent, returned from sendto(), and compares them to the message data size, but this is different "depth" semantics.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by abrown4 on 2016-05-25 17:10:49:

The fix: else logic for correct logic flow were omitted. "IF-ELSEIF" logic flow for the case where an item was actually read from the socket (with a timeout) would still result OS_QUEUE_TIMEOUT when it should have been OS_SUCCESS. Original unit test logic didn't cover this so I added some tests.

Branch: trac-166-posix-socket-queueget
commit:[changeset:91dba76] Fixed the posix osapi.c logic & added two covering unit tests.
commit:[changeset:b8adb5d] Fixed a typo in the new unit tests.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by abrown4 on 2016-05-25 18:00:27:

Replying to [comment:1 abrown4]:
Opened #191 for this discovered difference on the queue depth semantics.

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jphickey on 2018-05-22 13:29:43:

CCB 2018-05-22:

  • Socket Queues were intended for MacOS which does not support mqueue but never fully supported
  • Plan is to retire/deprecate socket queues altogether and replace with some type of local implementation for systems which do not support mqueue

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-03-04 17:15:37:

CCB - changesets from 5/25/2016 already merged into master, close? Possibly open another ticket to deprecate socket queue use?

@skliper
Copy link
Contributor Author

skliper commented Sep 30, 2019

Trac comment by jhageman on 2019-05-29 12:39:51:

CCB 5/29/2019 - Discussed, deprecate socket queue use. Next gen doesn't implement socket queues.

Mark as fixed by NG and "wontfix"

@skliper skliper closed this as completed Sep 30, 2019
@skliper skliper removed their assignment Sep 30, 2019
jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
jphickey pushed a commit to jphickey/osal that referenced this issue Aug 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant