Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CI: system tests: make random_free_port() parallel-safe #23488

Closed
edsantiago opened this issue Aug 2, 2024 · 9 comments · Fixed by #23595
Closed

CI: system tests: make random_free_port() parallel-safe #23488

edsantiago opened this issue Aug 2, 2024 · 9 comments · Fixed by #23595
Assignees

Comments

@edsantiago
Copy link
Member

I thought I had, but guess not. This is a placeholder issue until I get it right.

@edsantiago edsantiago self-assigned this Aug 2, 2024
@edsantiago
Copy link
Member Author

It's a Bats bug: parallelization doesn't work the way I expected it to (not a bug) and there's no documentation about it nor any way to get a job slot number (yes a bug). Filed bats-core/bats-core#968

@edsantiago
Copy link
Member Author

New mechanism submitted: 7c3d294

Tested on my laptop, because CI is down. No failures seen yet.

@Luap99
Copy link
Member

Luap99 commented Aug 5, 2024

I am somewhat sure this fixes #23471 as well.
But I will test locally as well as it is was simple to reproduce

@edsantiago
Copy link
Member Author

It absolutely does not. I see pasta timeouts on my laptop even with this port reservation approach.

@Luap99
Copy link
Member

Luap99 commented Aug 5, 2024

Before this it failed basically every time for me, now it no longer fails after 10+ runs locally

All I see in from current PR is

cp: cannot create regular file '/tmp/podman-bats-logs.WUKB1OTFB/7-podman puts pasta IP in %2Fetc/hosts.log': No such file or directory

which seems to cause exit code 1 even though bats prints 86 tests, 0 failures, 40 skipped in 15 seconds

@edsantiago
Copy link
Member Author

edsantiago commented Aug 5, 2024

Are you on f40 and have you dnf-upgraded? This [cp,slashes] is fixed in bats-11

@edsantiago
Copy link
Member Author

Took a little longer than I expected (i.e. more than two runs), but:

$ while :;do ./bats -T --rootless --tag='ci:parallel' 505 || break;done  
...
✗ |505| TCP port range forwarding, IPv4, tap [131778]                                                                                                          
   tags: ci:parallel                                                                                                                                            
   (from function `bail-now' in file test/system/helpers.bash, line 189,                                                                                        
    from function `die' in file test/system/helpers.bash, line 937,                                                                                             
    from function `run_podman' in file test/system/helpers.bash, line 539,                                                                                      
    from function `pasta_test_do' in file test/system/505-networking-pasta.bats, line 235,                                                                      
    in test file test/system/505-networking-pasta.bats, line 483)                                                                                               
     `pasta_test_do' failed                                                                                                                                     
                                                                                                                                                                
   [06:08:58.901317437] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman info --format {{.Host.Pasta.Executable}}                                        
   [06:09:00.064937823] /usr/bin/pasta                                                                                                                          
                                                                                                                                                                
   [06:09:00.318915429] $ /home/esm/src/atomic/2018-02.podman/libpod/bin/podman run --rm --name=c-socat-t21-6gost7x6 --net=pasta -p [192.168.101.31]:5730-5732:5
730-5732/tcp quay.io/libpod/testimage:20240123 sh -c for port in $(seq 5730 5732); do                              socat -u TCP4-LISTEN:${port},bind=[192.168.10
1.31] STDOUT &                          done; wait                                                                                                              
   [06:11:10.344876219] timeout: sending signal TERM to command ‘/home/esm/src/atomic/2018-02.podman/libpod/bin/podman’                                         
   timeout: sending signal KILL to command ‘/home/esm/src/atomic/2018-02.podman/libpod/bin/podman’                                                              
   [06:11:10.348823337] [ rc=137 (** EXPECTED 0 **) ]                                                                                                           
   #/vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv                                                                                                         
   #| FAIL: exit code is 137; expected 0                                                                                                                        
   #\^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^                                                                                                         
   # [teardown]

@Luap99
Copy link
Member

Luap99 commented Aug 5, 2024

I am on f39 and there is no bats update there. Anyhow I just renamed the test for now and running this in a loop for 15 minutes without failure. Before that it failed on almost every run so I am very certain that the the port conflict caused the hangs in the udp case due the incorrect behavior of REUSEADDR. #23471 (comment)

I now also got the TCP hang after 15 minutes but this seems different from the udp hangs listed in #23471 I think.

@edsantiago
Copy link
Member Author

Oh, sorry, I've just been treating all timeouts as the same bug, not sorting by tcp/udp.

The slash bug is harmless; you can disable the gather option, or add this to basic_setup in helpers.bash:

    # FIXME FIXME FIXME workaround for bats bug 789, fixed in 11.0
    BATS_TEST_DESCRIPTION=${BATS_TEST_DESCRIPTION//\//%2F}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants