PRACK refactoring #167

maximilianfridrich · 2023-01-04T10:06:56Z

Refactoring of PRACK related tests to only call re_cancel() once per test and remove code duplication.

maximilianfridrich · 2023-01-16T08:41:28Z

I am quite confused as to why some checks are failing. I rebased and added an additional test. Maybe an issue with the build agent? Could someone trigger a re-check?

sreimers · 2023-01-16T09:03:13Z

I triggered a re-check, but the problems/segfaults are remaining. Looks code/PR related.

maximilianfridrich · 2023-01-16T09:05:10Z

Thank you @sreimers. As a sanity check I ran retest off of main branch (and re main branch) and I am seeing the same issues. Are you able to reproduce that?

sreimers · 2023-01-16T09:27:26Z

To be sure, I restarted a old check from main and no problems. Locally with your PR branches I can reproduce the segfault with:

build/retest -a

==20351==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdba1e6308 at pc 0x562ea7d30876 bp 0x7ffdba1e5a70 sp 0x7ffdba1e5a60
WRITE of size 4 at 0x7ffdba1e6308 thread T0
    #0 0x562ea7d30875 in conn_handler /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipsess.c:389
    #1 0x7f75020d6bb6 in invite_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sipsess/listen.c:281
    #2 0x7f75020d6cd9 in request_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sipsess/listen.c:294
    #3 0x7f75020a4278 in sip_recv /home/sreimers/projekte/baresip/baresip-linux/re/src/sip/transp.c:357
    #4 0x7f75020a5446 in udp_recv_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sip/transp.c:431
    #5 0x7f7502032adf in udp_read /home/sreimers/projekte/baresip/baresip-linux/re/src/udp/udp.c:202
    #6 0x7f7502032c1a in udp_read_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/udp/udp.c:215
    #7 0x7f7501f8a876 in fd_poll /home/sreimers/projekte/baresip/baresip-linux/re/src/main/main.c:920
    #8 0x7f7501f8be7b in re_main /home/sreimers/projekte/baresip/baresip-linux/re/src/main/main.c:1100
    #9 0x562ea7d527a9 in re_main_timeout /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:969
    #10 0x562ea7d2d9fd in reg_test /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipreg.c:160
    #11 0x562ea7d2e070 in test_sipreg_tls /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipreg.c:209
    #12 0x562ea7d53553 in test_integration /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:1091
    #13 0x562ea7cdc2d0 in main /home/sreimers/projekte/baresip/baresip-linux/retest/src/main.c:243
    #14 0x7f7500e4d28f  (/usr/lib/libc.so.6+0x2328f)
    #15 0x7f7500e4d349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
    #16 0x562ea7c7f584 in _start ../sysdeps/x86_64/start.S:115

Address 0x7ffdba1e6308 is located in stack of thread T0 at offset 120 in frame 
    #0 0x562ea7d5266b in re_main_timeout /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:955

  This frame has 2 object(s):
    [32, 36) 'err' (line 957)
    [48, 120) 'tmr' (line 956) <== Memory access at offset 120 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipsess.c:389 in conn_handler

cspiel1 · 2023-01-16T09:35:39Z

I can not reproduce this so far. But thanks for the call stack. Maybe it has something to do with the last TLS changes. I try to analyze this.

Edit: with branch main

cspiel1 · 2023-01-16T09:40:10Z

The invite_handler() is called while test_sipreg_tls test is running. What is going on here?

cspiel1 · 2023-01-16T09:51:04Z

In this PR branch got this memory leak warnings in test_sipsess_100rel_abort:

...
Timers (2):
  0x56234a631f30: th=0x7f239b51eecb expire=1820ms file=/home/cspiel/src/re/src/sip/ctrans.c:207
  0x56234a631ee8: th=0x7f239b51ee93 expire=26227ms file=/home/cspiel/src/re/src/sip/ctrans.c:346
fd 0 in use: flags=1 fh=0x7f239b50c74b arg=0x56234a6bb130
mem: Memory leaks (49):
  0x56234a67af10: nrefs=1  size=160     [a0 b4 66 4a 23 56 00 00 a0 b4 66 4a 23 56 00 00 ] [..fJ#V....fJ#V..]
/usr/local/lib/libre.so.13(+0x44ccf) [0x7f239b4e5ccf] 
/usr/local/lib/libre.so.13(mem_alloc+0xa8) [0x7f239b4e5e15] 
/usr/local/lib/libre.so.13(mem_zalloc+0x27) [0x7f239b4e5f4a] 
/usr/local/lib/libre.so.13(sip_alloc+0x47) [0x7f239b526ce7] 
./build/retest(test_sipsess_100rel_abort+0x98) [0x562349b9f614] 
./build/retest(+0x5de4a) [0x562349ba8e4a] 
./build/retest(test_oom+0xf1) [0x562349ba8ff7] 
./build/retest(main+0x3d6) [0x562349b82e04] 
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f239b28cd90] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f239b28ce40]

maximilianfridrich · 2023-01-16T09:54:29Z

Thanks for the feedback, there must have been an issue on my side (and I ran valgrind ./retest -a which apparently always fails but that is not what the automated check runs). I will try to fix the issue with the new test test_sipsess_100rel_abort.

cspiel1 · 2023-01-16T09:57:40Z

The invite_handler() is called while test_sipreg_tls test is running. What is going on here?

Maybe the test_sipsess_100rel_abort does not close all requests/sipsess cleanly. Is it possible that the INVITE of this 100rel test is somewhere queued and then processed (unwanted) in the main loop of test_sipreg_tls that is invoked later?

alfredh · 2023-01-18T14:18:57Z

try adding -v to retest in the CI to see which test fails

I tested locally and the tests -r -i -o are working.

-t with valgrind fails:

alfredh@debian:~/tmp/retest$ valgrind ./retest -v -t
==140811== Memcheck, a memory error detector
==140811== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==140811== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==140811== Command: ./retest -v -t
==140811== 
retest: libre version 2.11.0 (x86_64/Linux)
using async polling method 'epoll'
using datapath './data'
multithread: 176 tests with 2 repeats (total 352 threads): test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:735:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:904:
test: test_sipsess_100rel_abort: test failed (Invalid argument [22])
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:821:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:904:
test: test_sipsess_100rel_abort: test failed (Invalid argument [22])
test: test_sipsess_100rel_caller_require: test failed (Invalid argument [22])
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:821:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:735:
test: test_sipsess_100rel_supported: test failed (Invalid argument [22])
test: test_sipsess_100rel_supported: test failed (Invalid argument [22])
test: test_sipsess_100rel_caller_require: test failed (Invalid argument [22])
122 failed: test_sipsess_100rel_caller_require  [22] [Invalid argument [22]]
123 failed: test_sipsess_100rel_supported   [22] [Invalid argument [22]]
124 failed: test_sipsess_100rel_abort       [22] [Invalid argument [22]]
298 failed: test_sipsess_100rel_caller_require  [22] [Invalid argument [22]]
299 failed: test_sipsess_100rel_supported   [22] [Invalid argument [22]]
300 failed: test_sipsess_100rel_abort       [22] [Invalid argument [22]]

retest: TEST_ERR: /home/alfredh/tmp/retest/src/main.c:254: (Invalid argument [22])
mem: Memory leaks (900):
  0x53257e0: nrefs=1  size=160     [70 74 32 05 00 00 00 00 e0 13 33 05 00 00 00 00 ] [pt2.......3.....]
/home/alfredh/tmp/re/libre.so.13(+0x3f8a6) [0x489c8a6] 
/home/alfredh/tmp/re/libre.so.13(mem_alloc+0x95) [0x489c9c6] 
/home/alfredh/tmp/re/libre.so.13(mem_zalloc+0x23) [0x489cae0] 
/home/alfredh/tmp/re/libre.so.13(sip_alloc+0x40) [0x48d91c6] 
./retest(test_sipsess_blind_transfer+0x6e) [0x1526d9]

when running under valgrind, the execution is slower. This can be similar
behaviour to a slow VM.

hence there might a timing problem or race condition.

You can also try helgrind to get hints about threads/mutex errors.

alfredh · 2023-01-18T14:22:12Z

src/sipsess.c

 struct test {
 	struct sip *sip;
 	struct sipsess_sock *sock;
 	struct sipsess *a;
 	struct sipsess *b;
+	struct tmr ans_tmr;


is this timer started and stopped properly ?

all timers must call tmr_cancel when the test is finished.

Thank you! I added tmr_cancel at the end of tests that use the timer.

maximilianfridrich · 2023-01-19T07:38:53Z

Thank you for the feedback! @alfredh Unfortunately, helgrind didn't show anything related to the 100rel tests.

I don't know for certain what exactly is happening here, but I am quite confident that it is an issue with retest and not re.

For the PRACK/100rel tests, the issue is most likely due to the fact that I don't have the callbacks necessary to confirm all SIP messages that have been sent. The last callback that is called is usually the prack_handler, after which an answer/update/cancel is sent. So there is no callback for the response to the PRACK, the UA might still be expecting it.

alfredh · 2023-01-19T13:20:27Z

can you please rebase your patch on top of git main HEAD ?

both re and retest has to be rebased

maximilianfridrich mentioned this pull request Jan 4, 2023

PRACK refactoring baresip/re#630

Merged

alfredh reviewed Jan 18, 2023

View reviewed changes

maximilianfridrich added 4 commits January 19, 2023 14:29

sipsess: improve 100rel tests

bcf2a77

sipsess: refactor 100rel tests

5047af9

sipsess: add 100rel_abort test

c3a18c0

sipsess: replace 100rel_abort with reject test

b667286

cspiel1 merged commit af0920c into baresip:main Jan 23, 2023

maximilianfridrich deleted the prack_refactoring branch January 24, 2023 07:08

This was referenced Jan 25, 2023

types: add RE_ prefix to ARRAY_SIZE() baresip/re#658

Merged

PRACK/100rel tests unstable #176

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PRACK refactoring #167

PRACK refactoring #167

maximilianfridrich commented Jan 4, 2023

maximilianfridrich commented Jan 16, 2023

sreimers commented Jan 16, 2023

maximilianfridrich commented Jan 16, 2023

sreimers commented Jan 16, 2023 •

edited

Loading

cspiel1 commented Jan 16, 2023 •

edited

Loading

cspiel1 commented Jan 16, 2023

cspiel1 commented Jan 16, 2023

maximilianfridrich commented Jan 16, 2023

cspiel1 commented Jan 16, 2023

alfredh commented Jan 18, 2023

alfredh Jan 18, 2023

maximilianfridrich Jan 19, 2023

maximilianfridrich commented Jan 19, 2023

alfredh commented Jan 19, 2023

PRACK refactoring #167

PRACK refactoring #167

Conversation

maximilianfridrich commented Jan 4, 2023

maximilianfridrich commented Jan 16, 2023

sreimers commented Jan 16, 2023

maximilianfridrich commented Jan 16, 2023

sreimers commented Jan 16, 2023 • edited Loading

cspiel1 commented Jan 16, 2023 • edited Loading

cspiel1 commented Jan 16, 2023

cspiel1 commented Jan 16, 2023

maximilianfridrich commented Jan 16, 2023

cspiel1 commented Jan 16, 2023

alfredh commented Jan 18, 2023

alfredh Jan 18, 2023

Choose a reason for hiding this comment

maximilianfridrich Jan 19, 2023

Choose a reason for hiding this comment

maximilianfridrich commented Jan 19, 2023

alfredh commented Jan 19, 2023

sreimers commented Jan 16, 2023 •

edited

Loading

cspiel1 commented Jan 16, 2023 •

edited

Loading