Skip to content
This repository has been archived by the owner on Feb 18, 2023. It is now read-only.

PRACK refactoring #167

Merged
merged 4 commits into from
Jan 23, 2023
Merged

PRACK refactoring #167

merged 4 commits into from
Jan 23, 2023

Conversation

maximilianfridrich
Copy link
Contributor

Refactoring of PRACK related tests to only call re_cancel() once per test and remove code duplication.

@maximilianfridrich
Copy link
Contributor Author

I am quite confused as to why some checks are failing. I rebased and added an additional test. Maybe an issue with the build agent? Could someone trigger a re-check?

@sreimers
Copy link
Member

I triggered a re-check, but the problems/segfaults are remaining. Looks code/PR related.

@maximilianfridrich
Copy link
Contributor Author

Thank you @sreimers. As a sanity check I ran retest off of main branch (and re main branch) and I am seeing the same issues. Are you able to reproduce that?

@sreimers
Copy link
Member

sreimers commented Jan 16, 2023

To be sure, I restarted a old check from main and no problems. Locally with your PR branches I can reproduce the segfault with:

build/retest -a
==20351==ERROR: AddressSanitizer: stack-buffer-overflow on address 0x7ffdba1e6308 at pc 0x562ea7d30876 bp 0x7ffdba1e5a70 sp 0x7ffdba1e5a60
WRITE of size 4 at 0x7ffdba1e6308 thread T0
    #0 0x562ea7d30875 in conn_handler /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipsess.c:389
    #1 0x7f75020d6bb6 in invite_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sipsess/listen.c:281
    #2 0x7f75020d6cd9 in request_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sipsess/listen.c:294
    #3 0x7f75020a4278 in sip_recv /home/sreimers/projekte/baresip/baresip-linux/re/src/sip/transp.c:357
    #4 0x7f75020a5446 in udp_recv_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/sip/transp.c:431
    #5 0x7f7502032adf in udp_read /home/sreimers/projekte/baresip/baresip-linux/re/src/udp/udp.c:202
    #6 0x7f7502032c1a in udp_read_handler /home/sreimers/projekte/baresip/baresip-linux/re/src/udp/udp.c:215
    #7 0x7f7501f8a876 in fd_poll /home/sreimers/projekte/baresip/baresip-linux/re/src/main/main.c:920
    #8 0x7f7501f8be7b in re_main /home/sreimers/projekte/baresip/baresip-linux/re/src/main/main.c:1100
    #9 0x562ea7d527a9 in re_main_timeout /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:969
    #10 0x562ea7d2d9fd in reg_test /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipreg.c:160
    #11 0x562ea7d2e070 in test_sipreg_tls /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipreg.c:209
    #12 0x562ea7d53553 in test_integration /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:1091
    #13 0x562ea7cdc2d0 in main /home/sreimers/projekte/baresip/baresip-linux/retest/src/main.c:243
    #14 0x7f7500e4d28f  (/usr/lib/libc.so.6+0x2328f)
    #15 0x7f7500e4d349 in __libc_start_main (/usr/lib/libc.so.6+0x23349)
    #16 0x562ea7c7f584 in _start ../sysdeps/x86_64/start.S:115

Address 0x7ffdba1e6308 is located in stack of thread T0 at offset 120 in frame 
    #0 0x562ea7d5266b in re_main_timeout /home/sreimers/projekte/baresip/baresip-linux/retest/src/test.c:955

  This frame has 2 object(s):
    [32, 36) 'err' (line 957)
    [48, 120) 'tmr' (line 956) <== Memory access at offset 120 overflows this variable
SUMMARY: AddressSanitizer: stack-buffer-overflow /home/sreimers/projekte/baresip/baresip-linux/retest/src/sipsess.c:389 in conn_handler

@cspiel1
Copy link
Collaborator

cspiel1 commented Jan 16, 2023

I can not reproduce this so far. But thanks for the call stack. Maybe it has something to do with the last TLS changes. I try to analyze this.

Edit: with branch main

@cspiel1
Copy link
Collaborator

cspiel1 commented Jan 16, 2023

The invite_handler() is called while test_sipreg_tls test is running. What is going on here?

@cspiel1
Copy link
Collaborator

cspiel1 commented Jan 16, 2023

In this PR branch got this memory leak warnings in test_sipsess_100rel_abort:

...
Timers (2):
  0x56234a631f30: th=0x7f239b51eecb expire=1820ms file=/home/cspiel/src/re/src/sip/ctrans.c:207
  0x56234a631ee8: th=0x7f239b51ee93 expire=26227ms file=/home/cspiel/src/re/src/sip/ctrans.c:346
fd 0 in use: flags=1 fh=0x7f239b50c74b arg=0x56234a6bb130
mem: Memory leaks (49):
  0x56234a67af10: nrefs=1  size=160     [a0 b4 66 4a 23 56 00 00 a0 b4 66 4a 23 56 00 00 ] [..fJ#V....fJ#V..]
/usr/local/lib/libre.so.13(+0x44ccf) [0x7f239b4e5ccf] 
/usr/local/lib/libre.so.13(mem_alloc+0xa8) [0x7f239b4e5e15] 
/usr/local/lib/libre.so.13(mem_zalloc+0x27) [0x7f239b4e5f4a] 
/usr/local/lib/libre.so.13(sip_alloc+0x47) [0x7f239b526ce7] 
./build/retest(test_sipsess_100rel_abort+0x98) [0x562349b9f614] 
./build/retest(+0x5de4a) [0x562349ba8e4a] 
./build/retest(test_oom+0xf1) [0x562349ba8ff7] 
./build/retest(main+0x3d6) [0x562349b82e04] 
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f239b28cd90] 
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f239b28ce40] 

@maximilianfridrich
Copy link
Contributor Author

Thanks for the feedback, there must have been an issue on my side (and I ran valgrind ./retest -a which apparently always fails but that is not what the automated check runs). I will try to fix the issue with the new test test_sipsess_100rel_abort.

@cspiel1
Copy link
Collaborator

cspiel1 commented Jan 16, 2023

The invite_handler() is called while test_sipreg_tls test is running. What is going on here?

Maybe the test_sipsess_100rel_abort does not close all requests/sipsess cleanly. Is it possible that the INVITE of this 100rel test is somewhere queued and then processed (unwanted) in the main loop of test_sipreg_tls that is invoked later?

@alfredh
Copy link
Contributor

alfredh commented Jan 18, 2023

try adding -v to retest in the CI to see which test fails

I tested locally and the tests -r -i -o are working.

-t with valgrind fails:

alfredh@debian:~/tmp/retest$ valgrind ./retest -v -t
==140811== Memcheck, a memory error detector
==140811== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==140811== Using Valgrind-3.16.1 and LibVEX; rerun with -h for copyright info
==140811== Command: ./retest -v -t
==140811== 
retest: libre version 2.11.0 (x86_64/Linux)
using async polling method 'epoll'
using datapath './data'
multithread: 176 tests with 2 repeats (total 352 threads): test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:735:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:904:
test: test_sipsess_100rel_abort: test failed (Invalid argument [22])
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:821:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:904:
test: test_sipsess_100rel_abort: test failed (Invalid argument [22])
test: test_sipsess_100rel_caller_require: test failed (Invalid argument [22])
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:821:
test_sipsess: ASSERT_TRUE: /home/alfredh/tmp/retest/src/sipsess.c:735:
test: test_sipsess_100rel_supported: test failed (Invalid argument [22])
test: test_sipsess_100rel_supported: test failed (Invalid argument [22])
test: test_sipsess_100rel_caller_require: test failed (Invalid argument [22])
122 failed: test_sipsess_100rel_caller_require  [22] [Invalid argument [22]]
123 failed: test_sipsess_100rel_supported   [22] [Invalid argument [22]]
124 failed: test_sipsess_100rel_abort       [22] [Invalid argument [22]]
298 failed: test_sipsess_100rel_caller_require  [22] [Invalid argument [22]]
299 failed: test_sipsess_100rel_supported   [22] [Invalid argument [22]]
300 failed: test_sipsess_100rel_abort       [22] [Invalid argument [22]]

retest: TEST_ERR: /home/alfredh/tmp/retest/src/main.c:254: (Invalid argument [22])
mem: Memory leaks (900):
  0x53257e0: nrefs=1  size=160     [70 74 32 05 00 00 00 00 e0 13 33 05 00 00 00 00 ] [pt2.......3.....]
/home/alfredh/tmp/re/libre.so.13(+0x3f8a6) [0x489c8a6] 
/home/alfredh/tmp/re/libre.so.13(mem_alloc+0x95) [0x489c9c6] 
/home/alfredh/tmp/re/libre.so.13(mem_zalloc+0x23) [0x489cae0] 
/home/alfredh/tmp/re/libre.so.13(sip_alloc+0x40) [0x48d91c6] 
./retest(test_sipsess_blind_transfer+0x6e) [0x1526d9] 

when running under valgrind, the execution is slower. This can be similar
behaviour to a slow VM.

hence there might a timing problem or race condition.

You can also try helgrind to get hints about threads/mutex errors.

struct test {
struct sip *sip;
struct sipsess_sock *sock;
struct sipsess *a;
struct sipsess *b;
struct tmr ans_tmr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this timer started and stopped properly ?

all timers must call tmr_cancel when the test is finished.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I added tmr_cancel at the end of tests that use the timer.

@maximilianfridrich
Copy link
Contributor Author

Thank you for the feedback! @alfredh Unfortunately, helgrind didn't show anything related to the 100rel tests.

I don't know for certain what exactly is happening here, but I am quite confident that it is an issue with retest and not re.

For the PRACK/100rel tests, the issue is most likely due to the fact that I don't have the callbacks necessary to confirm all SIP messages that have been sent. The last callback that is called is usually the prack_handler, after which an answer/update/cancel is sent. So there is no callback for the response to the PRACK, the UA might still be expecting it.

@alfredh
Copy link
Contributor

alfredh commented Jan 19, 2023

can you please rebase your patch on top of git main HEAD ?

both re and retest has to be rebased

@cspiel1 cspiel1 merged commit af0920c into baresip:main Jan 23, 2023
@maximilianfridrich maximilianfridrich deleted the prack_refactoring branch January 24, 2023 07:08
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants