Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Flaky tests test_verb_load test_verb_list can't create a daemon OSError Errno 99 #630

Closed
sloretz opened this issue Apr 20, 2021 · 14 comments
Assignees
Labels
bug Something isn't working tests

Comments

@sloretz
Copy link
Contributor

sloretz commented Apr 20, 2021

Bug report

Required Info:

  • Operating System:
    • Ubuntu Focal, x86 and arm64
  • Installation type:
  • Version or commit hash:
    • https://github.com/ros2/ros2cli/commit/a4daa7672f287997d1345a44ebb9e0c3d0c490b6 for sure
  • DDS implementation:
    • default
  • Client library (if applicable):
    • n/a

Steps to reproduce issue

???

Expected behavior

The tests would all pass

Actual behavior

They fail

Additional information

Jobs that have failed:

They all fail with OSError

FAIL: test_verb_list.TestVerbList.test_verb_list[rmw_fastrtps_cpp]
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/jenkins-agent/workspace/nightly_linux-aarch64_repeated/ws/src/ros2/ros2cli/ros2param/test/test_verb_list.py", line 123, in setUp
    with NodeStrategy(None) as node:
  File "/home/jenkins-agent/workspace/nightly_linux-aarch64_repeated/ws/install/ros2cli/lib/python3.8/site-packages/ros2cli/node/strategy.py", line 52, in __enter__
    self._daemon_node.__enter__()
  File "/home/jenkins-agent/workspace/nightly_linux-aarch64_repeated/ws/install/ros2cli/lib/python3.8/site-packages/ros2cli/node/daemon.py", line 116, in __enter__
    methods = self._proxy.system.listMethods()
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1109, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1450, in __request
    response = self.__transport.request(
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1153, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1165, in single_request
    http_conn = self.send_request(host, handler, request_body, verbose)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1278, in send_request
    self.send_content(connection, request_body)
  File "/usr/lib/python3.8/xmlrpc/client.py", line 1308, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python3.8/http/client.py", line 1250, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib/python3.8/http/client.py", line 1010, in _send_output
    self.send(msg)
  File "/usr/lib/python3.8/http/client.py", line 950, in send
    self.connect()
  File "/usr/lib/python3.8/http/client.py", line 921, in connect
    self.sock = self._create_connection(
  File "/usr/lib/python3.8/socket.py", line 808, in create_connection
    raise err
  File "/usr/lib/python3.8/socket.py", line 796, in create_connection
    sock.connect(sa)
OSError: [Errno 99] Cannot assign requested address

I suspect this is caused by #622, but I don't know for sure.

@sloretz sloretz added the bug Something isn't working label Apr 20, 2021
@ivanpauno
Copy link
Member

Errno 99 is EADDRNOTAVAIL, which for connect() means:

     (Internet domain sockets) The socket referred to by sockfd
     had not previously been bound to an address and, upon
     attempting to bind it to an ephemeral port, it was
     determined that all port numbers in the ephemeral port
     range are currently in use.  See the discussion of
     /proc/sys/net/ipv4/ip_local_port_range in ip(7).

i.e. we're running out of ephemeral ports.

We're creating a lot of daemon clients in the ros2cli tests, they might be lingering for some reason and we're running out of ephemeral ports.
I will try to reproduce this locally.

@ivanpauno ivanpauno self-assigned this Apr 21, 2021
@ivanpauno
Copy link
Member

i.e. we're running out of ephemeral ports.

I can reproduce the issue easily, and I'm definetely not running out of ephemeral ports.

@nuclearsandwich
Copy link
Member

Last night and tonight these test failures showed https://ci.ros2.org/view/nightly/job/nightly_linux_release/1899 https://ci.ros2.org/view/nightly/job/nightly_linux_release/1900

@ivanpauno did you do any further investigation beyond reproducing? Are you still looking into the issue?

@ivanpauno
Copy link
Member

@hidmic investigated the issue.
There was some discussion about it here.

@hidmic
Copy link
Contributor

hidmic commented Apr 29, 2021

Yeah, bottom line we replaced one bug with another (see #622). I haven't had time to prototype the third potential solution discussed in #632.

@chapulina
Copy link

chapulina commented Jun 28, 2021

This failed the last 2 builds, see the job history:

https://ci.ros2.org/view/nightly/job/nightly_linux_debug/lastSuccessfulBuild/testReport/ros2param.ros2param.test/test_verb_list/test_verb_list/history/

There hasn't been any action on #632 in 2 months, I'm not sure if that's still being considered.

@hidmic
Copy link
Contributor

hidmic commented Jun 28, 2021

It is. I haven't had the time to circle back :/. If it's becoming increasingly painful, I'll try to fit it as soon as I can, but anyone should feel free to pick it up as well.

@chapulina
Copy link

This came out again today: https://ci.ros2.org/view/nightly/job/nightly_linux-aarch64_release/1614/testReport/

I haven't had the time to circle back

How about marking this test xfail for now, so it doesn't turn the builds yellow?


https://github.com/osrf/buildfarmer/issues/207

@nuclearsandwich
Copy link
Member

How about marking this test xfail for now, so it doesn't turn the builds yellow?

I'm in favor.

@hidmic
Copy link
Contributor

hidmic commented Jul 12, 2021

With a bit of luck, #652 will solve this issue.

@hidmic
Copy link
Contributor

hidmic commented Jul 15, 2021

Alright, #652 is in. Let's hold on for a few days and see if this issue goes away.

@jacobperron
Copy link
Member

test_verb_load and test_verb_list (and also test_verb_dump) are still failing on Linux platforms, although I think the failures are different now, for instance: https://ci.ros2.org/view/nightly/job/nightly_linux_repeated/2353/testReport/junit/ros2param.ros2param.test

Maybe we could close this and open one or more new tickets for the different failures. @hidmic Please advise.

@hidmic
Copy link
Contributor

hidmic commented Jul 22, 2021

Indeed those are different. Likely connected to ros2/rmw_fastrtps#531.

Closing this ticket!

@hidmic hidmic closed this as completed Jul 22, 2021
@sloretz
Copy link
Contributor Author

sloretz commented Sep 22, 2021

FYI future buildfarmers: This was fixed by #652, but it doesn't seem to be backportable. These tests are still flaky in Galactic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working tests
Projects
None yet
Development

No branches or pull requests

6 participants