Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AssertionError: Docker container '...' contained extra running processes after test completed #694

Closed
s1113950 opened this issue Feb 20, 2020 · 5 comments · Fixed by #1119
Assignees
Labels
ci Issues related to CI in either Travis or Azure

Comments

@s1113950
Copy link
Collaborator

https://travis-ci.org/dw/mitogen/jobs/653211748?utm_medium=notification&utm_source=github_status

Sometimes tests fail with:

======================================================================
ERROR: tearDownClass (doas_test.DoasTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
    cls.dockerized_ssh.check_processes()
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
    counts
AssertionError: Docker container 'mitogen-test-a84b6c7898599fd6' contained extra running processes after test completed: {u'doas <defunct>': 1, u'ps': 1, u'sshd': 1}
======================================================================
ERROR: tearDownClass (sudo_test.NonEnglishPromptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
    cls.dockerized_ssh.check_processes()
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
    counts
AssertionError: Docker container 'mitogen-test-b3754356ae2ff3e' contained extra running processes after test completed: {u'ps': 1, u'sshd': 1, u'sudo <defunct>': 1}
----------------------------------------------------------------------
Ran 670 tests in 103.324s
FAILED (errors=2, skipped=45)
@s1113950 s1113950 added the ci Issues related to CI in either Travis or Azure label Feb 20, 2020
@s1113950
Copy link
Collaborator Author

discovered when testing #658 and #693

@s1113950
Copy link
Collaborator Author

This test also sometimes has issues, maybe 4/10 test runs:

ERROR: tearDownClass (sudo_test.NonEnglishPromptTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 534, in tearDownClass
    cls.dockerized_ssh.check_processes()
  File "/home/travis/build/dw/mitogen/tests/testlib.py", line 477, in check_processes
    counts
AssertionError: Docker container 'mitogen-test-e82c96aa0f304f6a' contained extra running processes after test completed: {u'ps': 1, u'sshd': 1, u'sudo <defunct>': 1}

https://travis-ci.org/dw/mitogen/jobs/653673605?utm_medium=notification&utm_source=github_status example test fail

@moreati moreati changed the title Test fail sometimes with doas_test.py on Python: 2.6 MODE=mitogen DISTRO=centos7 AssertionError: Docker container '...' contained extra running processes after test completed Aug 23, 2024
@moreati
Copy link
Member

moreati commented Aug 23, 2024

This still occurs intermittently on Azure Devops. While experimenting with Github Actions I'm seeing it reliably fail as follows, in py36-mode_mitogen-distro_centos7

test_unknown (utils_test.CastTest) ... ok
test_run_with_broker (utils_test.RunWithRouterTest) ... ok
test_with_broker (utils_test.WithRouterTest) ... ok
test_with_broker_preserves_attributes (utils_test.WithRouterTest) ... ok

======================================================================
ERROR: tearDownClass (ssh_test.BannerTest)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/runner/work/mitogen/mitogen/tests/testlib.py", line 654, in tearDownClass
    cls.dockerized_ssh.check_processes()
  File "/home/runner/work/mitogen/mitogen/tests/testlib.py", line 592, in check_processes
    processes,
AssertionError: Docker container 'mitogen-test-4fa107141314c30a' contained extra running processes after test completed: [('sshd', '/usr/sbin/sshd -D'), ('sshd', 'sshd: mitogen__has_sudo [priv]'), ('ps', 'ps -w -w -o ucomm= -o args=')]

----------------------------------------------------------------------
Ran 680 tests in 64.319s

FAILED (errors=1, skipped=66)

The following changes to testlib.py and ssh_test.py in isolation each made it go away, I haven't tested them in combination

@@ -609,6 +625,8 @@ def setUp(self):
        self.router = self.router_class(self.broker)

    def tearDown(self):
+       self.router.disconnect_all()
+       self.router.__exit__(None, None, None)
        del self.router
        super(RouterMixin, self).tearDown()
@@ -595,6 +614,7 @@ def tearDown(self):
            self.broker.shutdown()
        self.broker.join()
        del self.broker
+       time.sleep(1.0)
        super(BrokerMixin, self).tearDown()

    def sync_with_broker(self):
@@ -190,6 +190,7 @@ def test_verbose_enabled(self):
            self.dockerized_ssh.port,
        )
        self.assertEqual(name, context.name)
+       context.shutdown(wait=True)


class StubPermissionDeniedTest(StubSshMixin, testlib.TestCase):

My hypothesis is that the ssh connection is being left open by Mitogen, or it is relying on th GC to do cleanup, or we are closing the connection asynchronously and not waiting for it to complete.

@moreati moreati self-assigned this Aug 23, 2024
@moreati
Copy link
Member

moreati commented Aug 26, 2024

The following changes to testlib.py and ssh_test.py in isolation each made it go away, I haven't tested them in combination

Further testing revealed this is false. They might reduce occurence, but they haven't eliminated it.

@moreati
Copy link
Member

moreati commented Sep 9, 2024

I'm starting to think this is a combination of

  1. Tests not closing their context with context.shutdown()
  2. cls.dockerized_ssh.check_processes() in teardownClass() is in a race condition with sshd cleaning up after the client disconnects.

My guesstimate: After the TCP socket is closed (by the client or the server) sshd will take some non-zero amount of time to exit or terminate the session processes. Since there is no communcation channel between the client and server, there is not way for the test suite to wait for the server processes to be gone.

If so, then the least worst option to address item 2 is waiting a short period before calling check_processes(), and or retrying a small number of times.

moreati added a commit to moreati/mitogen that referenced this issue Sep 10, 2024
I'm about 75% sure the check is an unavoidable race condition, see
mitogen-hq#694 (comment). If
it occurs again, then reopen the issue.

Fixes mitogen-hq#694
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Issues related to CI in either Travis or Azure
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants