Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mitogen randomly complains about broken connections when using synchronize module within a loop #925

Open
nradchenko opened this issue Apr 29, 2022 · 6 comments
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome

Comments

@nradchenko
Copy link

I see mysterious tracebacks from mitogen appearing in the play output when using synchronize module. Sometimes these tracebacks don't affect the play and things seem to work okay, and sometimes they are followed by "An exception occurred during task execution" error from ansible and thus fail the task.

This is the shortest playbook to reproduce the problem I could come up with:

- hosts: all
  gather_facts: no
  tasks:
  - name: copy dummy files
    synchronize:
      src: zero
      dest: /root/{{ item }}
    loop:
      - 1
      - 2
      - 3

So here we have gather_facts: no, 3 elements in a list, a regular file zero (obviously, it has zero length), and the error is usually triggered after the second (!) iteration.

root@59cfe90e2671:~/test# ansible-playbook -i hosts.yml test.yml 

PLAY [all] ************************************************************************************************************************************************************************************

TASK [copy dummy files] ***********************************************************************************************************************************************************************
ok: [vps] => (item=1)
ok: [vps] => (item=2)
Exception ignored in: <function Connection.__del__ at 0x7f6f52fa89d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
fatal: [vps]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

PLAY RECAP ************************************************************************************************************************************************************************************
vps                        : ok=0    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

root@59cfe90e2671:~/test# ansible-playbook -i hosts.yml test.yml 

PLAY [all] ************************************************************************************************************************************************************************************

TASK [copy dummy files] ***********************************************************************************************************************************************************************
ok: [vps] => (item=1)
ok: [vps] => (item=2)
Exception ignored in: <function Connection.__del__ at 0x7fbef18c89d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
ok: [vps] => (item=3)

PLAY RECAP ************************************************************************************************************************************************************************************
vps                        : ok=1    changed=0    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0   

root@59cfe90e2671:~/test# 

It becomes a bit tricky to get stable errors when using gather_facts: yes, but the general rule is: more items in a loop - more chances to make something go wrong. According to my tests, the following playbook has high chances to fail:

- hosts: all
  gather_facts: yes
  tasks:
  - name: copy dummy files
    synchronize:
      src: zero
      dest: /root/{{ item }}
    loop:
      - 1
      - 2
      - 3
      - 4
      - 5
      - 6
      - 7
      - 8
      - 9
root@59cfe90e2671:~/test# ansible-playbook -i hosts.yml test.yml 

PLAY [all] ************************************************************************************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************************************
ok: [vps]

TASK [copy dummy files] ***********************************************************************************************************************************************************************
ok: [vps] => (item=1)
ok: [vps] => (item=2)
ok: [vps] => (item=3)
changed: [vps] => (item=4)
changed: [vps] => (item=5)
changed: [vps] => (item=6)
changed: [vps] => (item=7)
Exception ignored in: <function Connection.__del__ at 0x7f34684089d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Exception ignored in: <function Connection.__del__ at 0x7f34684089d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Exception ignored in: <function Connection.__del__ at 0x7f34684089d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
Exception ignored in: <function Connection.__del__ at 0x7f34684089d0>
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 531, in __del__
    self.close()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 884, in close
    self._put_connection()
  File "/usr/local/lib/python3.10/site-packages/ansible_mitogen/connection.py", line 865, in _put_connection
    self.chain.reset()
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1852, in reset
    self.call_no_reply(mitogen.core.Dispatcher.forget_chain, saved)
  File "/usr/local/lib/python3.10/site-packages/mitogen/parent.py", line 1907, in call_no_reply
    self.context.send(self.make_msg(fn, *args, **kwargs))
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2310, in send
    self.router.route(msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 3383, in route
    self.broker.defer(self._async_route, msg)
  File "/usr/local/lib/python3.10/site-packages/mitogen/core.py", line 2864, in defer
    raise Error(self.broker_shutdown_msg)
mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: mitogen.core.Error: An attempt was made to enqueue a message with a Broker that has already exitted. It is likely your program called Broker.shutdown() too early.
fatal: [vps]: FAILED! => {"msg": "Unexpected failure during module execution.", "stdout": ""}

PLAY RECAP ************************************************************************************************************************************************************************************
vps                        : ok=1    changed=0    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0   

root@59cfe90e2671:~/test# 

It doesn't seem to depend on synchronize params much, as it fails when pointing to nonexistent source file too, or even when rsync is not installed on host machine. I can also reproduce the issue with different ansible (2.8.15, 3.2.0, 4.5.0, 5.7.0) and mitogen (0.2.10) versions, and with python 2.7 on the target, and I couldn't find a combination of recent ansible and mitogen versions which works properly.

Host system:

  • Debian 11 based docker container (python:3.10)
  • Python 3.10.4
  • Ansible 2.10.17
  • Mitogen from master branch

The only lines in my ansible.cfg are:

[defaults]
strategy_plugins = /usr/local/lib/python3.10/site-packages/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

Target system:

  • Ubuntu 22.04 KVM VPS
  • Python 3.10.4
@nradchenko nradchenko added affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome labels Apr 29, 2022
@nradchenko
Copy link
Author

Forgot to mention that mitogen_task_isolation: fork doesn't fix anything in this case

@rlinq
Copy link

rlinq commented Jun 22, 2022

I can second the issue, although my tasks hasn't failed, just seen the Exception ignored.

Ansible 2.12.5
Mitogen-ansible v0.3.1.dev0
Python 3.9.6
CentOS 8 Stream

I didn't see this problem in Mitogen 2.10 and Ansible 2.9

We rely heavily on this fantastic framework that has greatly improved run time of our playbooks. Been using it daily since beginning of 2019!

@nradchenko
Copy link
Author

I didn't see this problem in Mitogen 2.10 and Ansible 2.9

That's weird, because initially I've discovered this bug with ansible 2.9.27 and mitogen 0.2.10, and it still fails for me with other 2.9.x. @rlinq

@nradchenko
Copy link
Author

So I've tracked it down to 3584084. Commenting close() call out in Connection destructor magically fixes broken connections issue, so it definitely triggered by this change just like promised.

Seems like connections don't leak for now as I couldn't reproduce the problem described in #140. Developer's answer in ansible/ansible#37411 also states that forcing connections to close by calling close() might be a no-op (at least for 2.8.0), so I'm not sure whether they should be made so. @dw

@innovationfleet
Copy link

I didn't see this problem in Mitogen 2.10 and Ansible 2.9

That's weird, because initially I've discovered this bug with ansible 2.9.27 and mitogen 0.2.10, and it still fails for me with other 2.9.x. @rlinq

I recalled versions from my memory, not having access to that versions right now. The ansible version was pre 2.9.24 if I recall correctly. Looks like you have isolated the issue, great work!

moreati added a commit to moreati/mitogen that referenced this issue Nov 4, 2022
Refs mitogen-hq#925 mitogen-hq#969

I'm not 100% confident that merely removing this is the full fix,
without substituting something else. I am sure keeping it would be
the greater of two evils. __del__() should be avoided on general
principal, and it's associated with multiple intermittant CI
failures, plus multiple user reported issues.
MemberIT pushed a commit to tribunadigital/mitogen that referenced this issue Jan 23, 2023
Refs mitogen-hq#925 mitogen-hq#969

I'm not 100% confident that merely removing this is the full fix,
without substituting something else. I am sure keeping it would be
the greater of two evils. __del__() should be avoided on general
principal, and it's associated with multiple intermittant CI
failures, plus multiple user reported issues.
@phemmer
Copy link

phemmer commented Feb 7, 2023

Seems like this can be closed. I encountered a conflict when updating my mitogen branch as the problematic code has been removed (0af2ce8).

thmour pushed a commit to thmour/mitogen that referenced this issue Jan 26, 2024
Refs mitogen-hq#925 mitogen-hq#969

I'm not 100% confident that merely removing this is the full fix,
without substituting something else. I am sure keeping it would be
the greater of two evils. __del__() should be avoided on general
principal, and it's associated with multiple intermittant CI
failures, plus multiple user reported issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-0.3 Issues related to 0.3.X Mitogen releases bug Code feature that hinders desired execution outcome
Projects
None yet
Development

No branches or pull requests

4 participants