Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies: Update requirements for kiwipy and plumpy #5732

Merged
merged 2 commits into from
Jul 2, 2024

Conversation

sphuber
Copy link
Contributor

@sphuber sphuber commented Oct 31, 2022

Fixes #4595

The new version kiwipy==0.8 and plumpy==0.22 provide compatiblity with newer versions of aio-pika==9.0 which comes with various connection stability improvements.

@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 5 times, most recently from 2088647 to d8e5693 Compare October 31, 2022 11:47
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from d8e5693 to 31a152d Compare November 9, 2022 15:45
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from 31a152d to ef092bd Compare November 25, 2022 10:06
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 2 times, most recently from 82d38fc to 1f22365 Compare December 7, 2022 11:31
@sphuber
Copy link
Contributor Author

sphuber commented Dec 7, 2022

There is a single bug that prevents this from being merged. Once the RmqCommunicator has opened a channel (which is done as soon as a process is run, for example a calcfunction) the close() call will hang. We have traced the source to the finished attribute of aio_pika.OneShotCallBack (which is an asyncio.Event) never being set. It is set when the OneShotCallBack is called (see the __call__ method which calls __task_inner whichs sets the finished event), but this is never called. We noticed that the channel, which is closed before the connection, had an exception during the close callbacks. We couldn't figure out which exception (seemed empty) or where it had been raised, but it may have a knock-on effect on the closing of the connection. Finally, if we disable nest_asyncio, the problem disappears, so it may be related to nest_asyncio as well. Unfortunately we currently cannot get rid of nest_asyncio as we need it to be able to call processes from within processes, i.e., to have a workfunction call a calcfunction.

@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from 1f22365 to 578044e Compare March 5, 2023 17:11
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 3 times, most recently from 7af5355 to 6a31919 Compare March 13, 2023 23:05
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from 6a31919 to a4ead79 Compare June 2, 2023 10:31
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 7 times, most recently from 800a73c to fb1d2f6 Compare June 23, 2023 20:21
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from fb1d2f6 to 4ba3e6a Compare July 13, 2023 13:30
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 3 times, most recently from f55080f to 8229a7c Compare November 9, 2023 10:34
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 6 times, most recently from f0d0b48 to e428cf3 Compare November 17, 2023 14:06
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch 3 times, most recently from 12d1fd3 to adfac17 Compare February 2, 2024 14:57
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from c4186ee to 564f908 Compare March 25, 2024 08:41
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from 564f908 to da24249 Compare July 1, 2024 10:25
Copy link

codecov bot commented Jul 1, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 77.79%. Comparing base (ef60b66) to head (323acee).
Report is 115 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #5732      +/-   ##
==========================================
+ Coverage   77.51%   77.79%   +0.29%     
==========================================
  Files         560      561       +1     
  Lines       41444    41819     +375     
==========================================
+ Hits        32120    32530     +410     
+ Misses       9324     9289      -35     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from da24249 to aead553 Compare July 1, 2024 11:14
sphuber added 2 commits July 2, 2024 11:18
The new version `kiwipy==0.8` and `plumpy==0.22` provide compatiblity
with newer versions of `aio-pika==8.0` which comes with various
connection stability improvements.

The only known problem is that `Communicator.close()` times out if at
least one process has been run. A test is added to capture this behavior
in `tests/manage/test_manager.py` which currently fails as a
`TimeoutError` is thrown. A lot of debugging has not yet led to finding
the cause nor a solution.

Since this behavior only seems to be appearing in the tests and does not
seem to affect regular usage, the upgrade is accepted regardless.
The exception is caught and logged as a warning.
@sphuber sphuber force-pushed the fix/bump-engine-dependencies branch from aead553 to 323acee Compare July 2, 2024 09:19
@sphuber sphuber merged commit e913715 into aiidateam:main Jul 2, 2024
29 checks passed
@sphuber sphuber deleted the fix/bump-engine-dependencies branch July 2, 2024 10:05
@unkcpz
Copy link
Member

unkcpz commented Dec 21, 2024

Hi @sphuber, it is a pity I didn't see your comment earlier, otherwise it will relive a lot burden I did the debug for the same problem. I vaguelly remember you mentioned a bug that prevent a release, and didn't realize this is the one!

Could you have a look at mosquito/aiormq#208 for the solution I proposed. I somehow don't think it related to nest_asyncio was used because other people in the aio-pika issue report encounter the similar problem (maybe they use nest_asyncio as well but who knows.)

We noticed that the channel, which is closed before the connection, had an exception during the close callbacks.

This is I missed in my debugging, and willing to see if I am able to track it down with the help of more explicit debug info hopefully. If you have time, maybe we can have a pair debugging session on the issue?

EDIT: I think this exception is the channal cancelling exception from aiormq channel close call, it was raised anyway since from my understanding the aiormq treat closing channel as an exception so it has some way for downstream libs to handle it explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Issues with new asyncio daemon (stress-test)
2 participants