Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Two tests infinitely hang #4564

Closed
bnavigator opened this issue Mar 8, 2021 · 8 comments
Closed

Two tests infinitely hang #4564

bnavigator opened this issue Mar 8, 2021 · 8 comments

Comments

@bnavigator
Copy link
Contributor

bnavigator commented Mar 8, 2021

The following two newly introduced tests (#4490) infinitely hang while trying to build distributed 2021.3.0 for openSUSE:

[    6s] ============================= test session starts ==============================
[    6s] platform linux -- Python 3.6.13, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3.6
[    6s] cachedir: .pytest_cache
[    6s] rootdir: /home/abuild/rpmbuild/BUILD/distributed-2021.3.0, configfile: setup.cfg
[    6s] plugins: forked-1.3.0, xdist-2.2.0, asyncio-0.14.0, rerunfailures-9.1.1, timeout-1.4.2
[    6s] timeout: 120.0s
[    6s] timeout method: thread
[    6s] timeout func_only: False
...
[  192s] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
[  192s] 
[  192s] ~~~~~~~~~~~~~~~~~~~~~~ Stack of IO loop (140130848056896) ~~~~~~~~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  192s]     self._bootstrap_inner()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  192s]     self.run()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  192s]     self._target(*self._args, **self._kwargs)
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils.py", line 428, in run_loop
[  192s]     loop.start()
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils_test.py", line 128, in start
[  192s]     orig_start()
[  192s]   File "/usr/lib64/python3.6/site-packages/tornado/platform/asyncio.py", line 199, in start
[  192s]     self.asyncio_loop.run_forever()
[  192s]   File "/usr/lib64/python3.6/asyncio/base_events.py", line 442, in run_forever
[  192s]     self._run_once()
[  192s]   File "/usr/lib64/python3.6/asyncio/base_events.py", line 1426, in _run_once
[  192s]     event_list = self._selector.select(timeout)
[  192s]   File "/usr/lib64/python3.6/selectors.py", line 445, in select
[  192s]     fd_event_list = self._epoll.poll(timeout, max_ev)
[  192s] 
[  192s] ~~~~~~~~~~~~~~ Stack of Dask-Callback-Thread_0 (140131105625664) ~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  192s]     self._bootstrap_inner()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  192s]     self.run()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  192s]     self._target(*self._args, **self._kwargs)
[  192s]   File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 67, in _worker
[  192s]     work_item = work_queue.get(block=True)
[  192s]   File "/usr/lib64/python3.6/queue.py", line 164, in get
[  192s]     self.not_empty.wait()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 295, in wait
[  192s]     waiter.acquire()
[  192s] 
[  192s] ~~~~~~~~~~~~~~~ Stack of TCP-Executor-22210-1 (140130882938432) ~~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  192s]     self._bootstrap_inner()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  192s]     self.run()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  192s]     self._target(*self._args, **self._kwargs)
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  192s]     task = work_queue.get(timeout=1)
[  192s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  192s]     self.not_empty.wait(remaining)
[  192s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  192s]     gotit = waiter.acquire(True, timeout)
[  192s] 
[  192s] ~~~~~~~~~~~~~~~ Stack of TCP-Executor-22210-0 (140130891331136) ~~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  192s]     self._bootstrap_inner()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  192s]     self.run()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  192s]     self._target(*self._args, **self._kwargs)
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  192s]     task = work_queue.get(timeout=1)
[  192s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  192s]     self.not_empty.wait(remaining)
[  192s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  192s]     gotit = waiter.acquire(True, timeout)
[  192s] 
[  192s] ~~~~~~~~~~~~~~~~~~ Stack of Dask-Offload_0 (140131153737280) ~~~~~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  192s]     self._bootstrap_inner()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  192s]     self.run()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  192s]     self._target(*self._args, **self._kwargs)
[  192s]   File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 67, in _worker
[  192s]     work_item = work_queue.get(block=True)
[  192s]   File "/usr/lib64/python3.6/queue.py", line 164, in get
[  192s]     self.not_empty.wait()
[  192s]   File "/usr/lib64/python3.6/threading.py", line 295, in wait
[  192s]     waiter.acquire()
[  192s] 
[  192s] ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140131270149696) ~~~~~~~~~~~~~~~~~~~~~
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
[  192s]     reply.run()
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220, in run
[  192s]     self._result = func(*args, **kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
[  192s]     msg = Message.from_io(io)
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
[  192s]     header = io.read(9)  # type 1, channel 4, payload 4
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 400, in read
[  192s]     data = self._read(numbytes - len(buf))
[  192s] 
[  192s] ~~~~~~~~~~~~~~~~~~~~ Stack of MainThread (140131273639744) ~~~~~~~~~~~~~~~~~~~~~
[  192s]   File "<string>", line 1, in <module>
[  192s]   File "<string>", line 8, in <module>
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1554, in serve
[  192s]     WorkerGateway(io=io, id=id, _startcount=2).serve()
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1060, in serve
[  192s]     self._execpool.integrate_as_primary_thread()
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 267, in integrate_as_primary_thread
[  192s]     self._perform_spawn(reply)
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
[  192s]     reply.run()
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220, in run
[  192s]     self._result = func(*args, **kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1084, in executetask
[  192s]     do_exec(co, loc)  # noqa
[  192s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 261, in <module>
[  192s]     config.hook.pytest_cmdline_main(config=config)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  192s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  192s]     return self._inner_hookexec(hook, methods, kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  192s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  192s]     res = hook_impl.function(*args)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 316, in pytest_cmdline_main
[  192s]     return wrap_session(config, _main)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 269, in wrap_session
[  192s]     session.exitstatus = doit(config, session) or 0
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 323, in _main
[  192s]     config.hook.pytest_runtestloop(session=session)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  192s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  192s]     return self._inner_hookexec(hook, methods, kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  192s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  192s]     res = hook_impl.function(*args)
[  192s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 73, in pytest_runtestloop
[  192s]     self.run_one_test(torun)
[  192s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 90, in run_one_test
[  192s]     self.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  192s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  192s]     return self._inner_hookexec(hook, methods, kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  192s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  192s]     res = hook_impl.function(*args)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 109, in pytest_runtest_protocol
[  192s]     runtestprotocol(item, nextitem=nextitem)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 126, in runtestprotocol
[  192s]     reports.append(call_and_report(item, "call", log))
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 215, in call_and_report
[  192s]     call = call_runtest_hook(item, when, **kwds)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 255, in call_runtest_hook
[  192s]     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 311, in from_call
[  192s]     result: Optional[TResult] = func()
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 255, in <lambda>
[  192s]     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  192s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  192s]     return self._inner_hookexec(hook, methods, kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  192s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  192s]     res = hook_impl.function(*args)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 162, in pytest_runtest_call
[  192s]     item.runtest()
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/python.py", line 1641, in runtest
[  192s]     self.ihook.pytest_pyfunc_call(pyfuncitem=self)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  192s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  192s]     return self._inner_hookexec(hook, methods, kwargs)
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  192s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  192s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  192s]     res = hook_impl.function(*args)
[  192s]   File "/usr/lib/python3.6/site-packages/_pytest/python.py", line 183, in pytest_pyfunc_call
[  192s]     result = testfunction(**testargs)
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/tests/test_queues.py", line 296, in test_queue_in_task
[  192s]     result = c.submit(foo).result()
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/client.py", line 219, in result
[  192s]     result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False)
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/client.py", line 838, in sync
[  192s]     self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
[  192s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils.py", line 348, in sync
[  192s]     e.wait(10)
[  192s]   File "/usr/lib64/python3.6/threading.py", line 551, in wait
[  192s]     signaled = self._cond.wait(timeout)
[  192s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  192s]     gotit = waiter.acquire(True, timeout)
[  192s] 
[  192s] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
[  192s] 
[  192s] [gw3] node down: Not properly terminated
[  196s] [gw3] [ 97%] FAILED distributed/tests/test_queues.py::test_queue_in_task 
[  196s] 
...
[  222s] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Captured stderr ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[  222s] distributed.protocol.pickle - INFO - Failed to deserialize <memory at 0x7fc75effd7c8>
[  222s] Traceback (most recent call last):
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/pickle.py", line 75, in loads
[  222s]     return pickle.loads(x)
[  222s] ValueError: unsupported pickle protocol: 5
[  222s] distributed.protocol.core - CRITICAL - Failed to deserialize
[  222s] Traceback (most recent call last):
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/core.py", line 130, in loads
[  222s]     value = merge_and_deserialize(head, fs, deserializers=deserializers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 450, in merge_and_deserialize
[  222s]     return deserialize(header, merged_frames, deserializers=deserializers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 384, in deserialize
[  222s]     return loads(header, frames)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 80, in pickle_loads
[  222s]     return pickle.loads(x, buffers=buffers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/pickle.py", line 75, in loads
[  222s]     return pickle.loads(x)
[  222s] ValueError: unsupported pickle protocol: 5
[  222s] distributed.utils - ERROR - unsupported pickle protocol: 5
[  222s] Traceback (most recent call last):
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils.py", line 666, in log_errors
[  222s]     yield
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/client.py", line 1228, in _handle_report
[  222s]     msgs = await self.scheduler_comm.comm.read()
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/comm/tcp.py", line 221, in read
[  222s]     allow_offload=self.allow_offload,
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/comm/utils.py", line 80, in from_frames
[  222s]     res = _from_frames()
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/comm/utils.py", line 64, in _from_frames
[  222s]     frames, deserialize=deserialize, deserializers=deserializers
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/core.py", line 130, in loads
[  222s]     value = merge_and_deserialize(head, fs, deserializers=deserializers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 450, in merge_and_deserialize
[  222s]     return deserialize(header, merged_frames, deserializers=deserializers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 384, in deserialize
[  222s]     return loads(header, frames)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/serialize.py", line 80, in pickle_loads
[  222s]     return pickle.loads(x, buffers=buffers)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/protocol/pickle.py", line 75, in loads
[  222s]     return pickle.loads(x)
[  222s] ValueError: unsupported pickle protocol: 5
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~~~~~ Stack of IO loop (140493765711424) ~~~~~~~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils.py", line 428, in run_loop
[  222s]     loop.start()
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils_test.py", line 128, in start
[  222s]     orig_start()
[  222s]   File "/usr/lib64/python3.6/site-packages/tornado/platform/asyncio.py", line 199, in start
[  222s]     self.asyncio_loop.run_forever()
[  222s]   File "/usr/lib64/python3.6/asyncio/base_events.py", line 442, in run_forever
[  222s]     self._run_once()
[  222s]   File "/usr/lib64/python3.6/asyncio/base_events.py", line 1426, in _run_once
[  222s]     event_list = self._selector.select(timeout)
[  222s]   File "/usr/lib64/python3.6/selectors.py", line 445, in select
[  222s]     fd_event_list = self._epoll.poll(timeout, max_ev)
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~~~~~ Stack of Profile (140493790889536) ~~~~~~~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/profile.py", line 269, in _watch
[  222s]     sleep(interval)
[  222s] 
[  222s] ~~~~~~~~~~~ Stack of Dask-Worker-Threads'-22219-47 (140492733933120) ~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  222s]     task = work_queue.get(timeout=1)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  222s]     self.not_empty.wait(remaining)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] ~~~~~~~~~~~ Stack of Dask-Worker-Threads'-22219-46 (140493228840512) ~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  222s]     task = work_queue.get(timeout=1)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  222s]     self.not_empty.wait(remaining)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] ~~~~~~~~~~~ Stack of Dask-Worker-Threads'-22219-45 (140493816067648) ~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  222s]     task = work_queue.get(timeout=1)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  222s]     self.not_empty.wait(remaining)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] ~~~~~~~~~~~~~~~ Stack of TCP-Executor-22219-1 (140494241146432) ~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  222s]     task = work_queue.get(timeout=1)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  222s]     self.not_empty.wait(remaining)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] ~~~~~~~~~~~~~~~ Stack of TCP-Executor-22219-0 (140494257931840) ~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/threadpoolexecutor.py", line 51, in _worker
[  222s]     task = work_queue.get(timeout=1)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 173, in get
[  222s]     self.not_empty.wait(remaining)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~ Stack of Dask-Offload_0 (140494370801216) ~~~~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib64/python3.6/threading.py", line 884, in _bootstrap
[  222s]     self._bootstrap_inner()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
[  222s]     self.run()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 864, in run
[  222s]     self._target(*self._args, **self._kwargs)
[  222s]   File "/usr/lib64/python3.6/concurrent/futures/thread.py", line 67, in _worker
[  222s]     work_item = work_queue.get(block=True)
[  222s]   File "/usr/lib64/python3.6/queue.py", line 164, in get
[  222s]     self.not_empty.wait()
[  222s]   File "/usr/lib64/python3.6/threading.py", line 295, in wait
[  222s]     waiter.acquire()
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~~~~ Stack of <unknown> (140494419920448) ~~~~~~~~~~~~~~~~~~~~~
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
[  222s]     reply.run()
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220, in run
[  222s]     self._result = func(*args, **kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 967, in _thread_receiver
[  222s]     msg = Message.from_io(io)
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 432, in from_io
[  222s]     header = io.read(9)  # type 1, channel 4, payload 4
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 400, in read
[  222s]     data = self._read(numbytes - len(buf))
[  222s] 
[  222s] ~~~~~~~~~~~~~~~~~~~~ Stack of MainThread (140494423410496) ~~~~~~~~~~~~~~~~~~~~~
[  222s]   File "<string>", line 1, in <module>
[  222s]   File "<string>", line 8, in <module>
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1554, in serve
[  222s]     WorkerGateway(io=io, id=id, _startcount=2).serve()
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1060, in serve
[  222s]     self._execpool.integrate_as_primary_thread()
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 267, in integrate_as_primary_thread
[  222s]     self._perform_spawn(reply)
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 285, in _perform_spawn
[  222s]     reply.run()
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 220, in run
[  222s]     self._result = func(*args, **kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/execnet/gateway_base.py", line 1084, in executetask
[  222s]     do_exec(co, loc)  # noqa
[  222s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 261, in <module>
[  222s]     config.hook.pytest_cmdline_main(config=config)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  222s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  222s]     return self._inner_hookexec(hook, methods, kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  222s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  222s]     res = hook_impl.function(*args)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 316, in pytest_cmdline_main
[  222s]     return wrap_session(config, _main)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 269, in wrap_session
[  222s]     session.exitstatus = doit(config, session) or 0
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/main.py", line 323, in _main
[  222s]     config.hook.pytest_runtestloop(session=session)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  222s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  222s]     return self._inner_hookexec(hook, methods, kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  222s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  222s]     res = hook_impl.function(*args)
[  222s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 73, in pytest_runtestloop
[  222s]     self.run_one_test(torun)
[  222s]   File "/usr/lib/python3.6/site-packages/xdist/remote.py", line 90, in run_one_test
[  222s]     self.config.hook.pytest_runtest_protocol(item=item, nextitem=nextitem)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  222s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  222s]     return self._inner_hookexec(hook, methods, kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  222s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  222s]     res = hook_impl.function(*args)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 109, in pytest_runtest_protocol
[  222s]     runtestprotocol(item, nextitem=nextitem)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 126, in runtestprotocol
[  222s]     reports.append(call_and_report(item, "call", log))
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 215, in call_and_report
[  222s]     call = call_runtest_hook(item, when, **kwds)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 255, in call_runtest_hook
[  222s]     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 311, in from_call
[  222s]     result: Optional[TResult] = func()
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 255, in <lambda>
[  222s]     lambda: ihook(item=item, **kwds), when=when, reraise=reraise
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  222s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  222s]     return self._inner_hookexec(hook, methods, kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  222s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  222s]     res = hook_impl.function(*args)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/runner.py", line 162, in pytest_runtest_call
[  222s]     item.runtest()
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/python.py", line 1641, in runtest
[  222s]     self.ihook.pytest_pyfunc_call(pyfuncitem=self)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/hooks.py", line 286, in __call__
[  222s]     return self._hookexec(self, self.get_hookimpls(), kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 93, in _hookexec
[  222s]     return self._inner_hookexec(hook, methods, kwargs)
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/manager.py", line 87, in <lambda>
[  222s]     firstresult=hook.spec.opts.get("firstresult") if hook.spec else False,
[  222s]   File "/usr/lib/python3.6/site-packages/pluggy/callers.py", line 187, in _multicall
[  222s]     res = hook_impl.function(*args)
[  222s]   File "/usr/lib/python3.6/site-packages/_pytest/python.py", line 183, in pytest_pyfunc_call
[  222s]     result = testfunction(**testargs)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/tests/test_variable.py", line 57, in test_variable_in_task
[  222s]     result = c.submit(foo).result()
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/client.py", line 219, in result
[  222s]     result = self.client.sync(self._result, callback_timeout=timeout, raiseit=False)
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/client.py", line 838, in sync
[  222s]     self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
[  222s]   File "/home/abuild/rpmbuild/BUILD/distributed-2021.3.0/distributed/utils.py", line 348, in sync
[  222s]     e.wait(10)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 551, in wait
[  222s]     signaled = self._cond.wait(timeout)
[  222s]   File "/usr/lib64/python3.6/threading.py", line 299, in wait
[  222s]     gotit = waiter.acquire(True, timeout)
[  222s] 
[  222s] +++++++++++++++++++++++++++++++++++ Timeout ++++++++++++++++++++++++++++++++++++
[  222s] 
[  222s] [gw4] node down: Not properly terminated
[  225s] [gw4] [ 99%] FAILED distributed/tests/test_variable.py::test_variable_in_task 
...
[  236s] =================================== FAILURES ===================================
[  236s] _______________________ distributed/tests/test_queues.py _______________________
[  236s] [gw3] linux -- Python 3.6.13 /usr/bin/python3.6
[  236s] worker 'gw3' crashed while running 'distributed/tests/test_queues.py::test_queue_in_task'
[  236s] ______________________ distributed/tests/test_variable.py ______________________
[  236s] [gw4] linux -- Python 3.6.13 /usr/bin/python3.6
[  236s] worker 'gw4' crashed while running 'distributed/tests/test_variable.py::test_variable_in_task'
...

Full build logs (The many deselections are because of pytest-asyncio 0.14 (#4212)):

Environment:

@bnavigator bnavigator changed the title Two tests indefinitely hang Two tests infinitely hang Mar 8, 2021
@quasiben
Copy link
Member

I'm not sure why those test are hanging. Do they hang for other versions of Python ? Note: Dask is moving to dropping 3.6:
dask/dask#7006

@bnavigator
Copy link
Contributor Author

bnavigator commented Mar 11, 2021

Yes, Python 3.8 shows the same behavior. distributed-py38-hang.log

I edited my initial post

@quasiben
Copy link
Member

On your system can you try running those tests isolation and see if they fail ?

py.test -s -v distributed/tests/test_variable.py::test_variable_in_task
py.test -s -v distributed/tests/test_queues.py::test_queue_in_task

Another thought here is maybe some dependency has updated in your CI. For example. dask has pinned pytest-asyncio:

I think in the openSUSE build I see pytest-asyncio-0.14.0-1.3

@bnavigator
Copy link
Contributor Author

New buildlogs with tests in isolation for python 3.6 and 3.8 with continue on error:

Interesting that now the hangs are only on python 3.6. I also checked whether the -s parameter makes any difference. It does not.

@quasiben
Copy link
Member

The -s is an option for capturing stdout and won't impact the hang but hopefully give us more output. Looking at the output in the log file we can see an error:

[   22s] distributed/tests/test_variable.py::test_variable_in_task distributed.protocol.pickle - INFO - Failed to deserialize <memory at 0x7fe1ebba8e88>
[   22s] Traceback (most recent call last):
[   22s]   File "/usr/lib64/python3.6/site-packages/distributed/protocol/pickle.py", line 75, in loads
[   22s]     return pickle.loads(x)
[   22s] ValueError: unsupported pickle protocol: 5

Maybe if we add pickle5 package here that will resolve the hang. Is it possible to add into the CI and re-run ?
xref: #3849

@bnavigator
Copy link
Contributor Author

bnavigator commented Mar 11, 2021

The -s is an option for capturing stdout and won't impact the hang but hopefully give us more output.

The captured stdout stderr was also reported by pytest in the initial buildlog. I was rather referring to the fact whether not letting pytest capture it and report later would change the hang.

@bnavigator
Copy link
Contributor Author

Unfortunately, pickle5 is not avaiable as rpm package for openSUSE Tumbleweed. That's not surprising, as it is a backport of stdlib functionality to Python older than 3.8 and python36 is only supported as a legacy by Tumbleweed.

It is a bit odd though, that #4564 (comment) fails with the same hang for Py3.8. And I can not reproduce it!

@bnavigator
Copy link
Contributor Author

Cannot reproduce with Python 3.8 and distributed 2021.3.1. (Python 3.6 has been dropped by dask)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants