A version of the sync macro that throws earlier #34198

Keno · 2019-12-25T19:16:25Z

I've been looking at what causes deadlocks in our test suite in an effort to cut down on the number of failed tests on CI that result in hangs (since those are hard to diagnose and resolve). I found that by playing with various resource limits, it is easy to create hangs in the test suite. The reason we get a hang rather than a more easily diagnosable error is two fold. We either:

Aren't watching for the error (e.g. a socket remote end closing)
We aren't propagating the error to the top level

A very common situation for case 2) is that the test is wrapped in @sync which doesn't return until all tasks have finished or error'ed. However, in many cases one of the tasks produces data for the others, so if that task errors, the remaining tasks will wait forever. This PR aims to address that situation by introducing a new @syncany macro that immediately rethrows any errors thrown by a contained task rather than waiting for all of them to finish. The implementation isn't super performant (it allocates a new task per object being waited on), but should be sufficient for use in the test suite. A better implementation would create a new scheduler object that can be inserted into multiple wait queues.

Example usage of the new macro:

@sync begin
    @async error("Hello")
    @async sleep(1000)
end # Waits 1000s

@syncany begin
    @async error("Hello")
    @async sleep(1000)
end # Throws immediately

The macro doesn't do any sort of cleanup for the tasks that do not finish, and just lets them run. In the future, we may want to automatically cancel those tasks, but that seemed like a bigger design problem than the simple thing that I wanted (something that propagates error messages more readily, so we see them in the logs).

jonas-schulze · 2019-12-25T23:56:12Z

Couldn’t this be the new default behavior?

tkf · 2020-01-05T19:54:28Z

base/task.jl

+during error handling.
+
+!!! Note
+    This interface is experimental and subject to change or removal without notice.


👍 for marking this experimental. Otherwise, implementing structured concurrency #33248 becomes much harder.

vtjnash · 2020-01-05T20:02:20Z

Maybe we should call this Base.Future.@async (with the idea that once optimized, it should be the default)? The any name seems odd (because it is waiting for all, it’s just early error).

Keno · 2020-01-06T15:54:52Z

You mean Base.Future.@sync? I'd be fine with that.

tkf · 2020-01-06T22:12:06Z

Isn't it strange to put an experimental API in Future? I think it makes harder to fix the bigger design problem.

c42f · 2020-01-08T06:07:48Z

Letting tasks drop out the bottom seems like a pretty big change to the semantics of @sync so I'm not sure it makes sense to declare it the Future without trying it out for a while. I'd also be happier not to export it for the moment if it's experimental.

As you say, it would ideally interact with some kind of cancellation. That's hard but it feels a lot more consistent with the way @sync works right now.

It seems handy though and could be a big practical improvement for the test suite. For the moment, could we just name it clearly and not export it? How about @sync_failfast or something equally ugly but explicit?

Keno · 2020-02-27T20:07:53Z

From triage: let's call this Base.Experimental.@sync

tkf · 2020-02-27T20:37:56Z

Base.Experimental.@sync doesn't sound like a stop-gap measure. Shouldn't the macro name (@sync) clarify that:

this macro has undesirable property (tasks leak),
it is included temporary only because a real solution is impossible to implement with the current infrastructure, and
it will be removed or at least will be discouraged to use in the future?

Keno · 2020-03-13T02:17:55Z

Let's merge this and see if it helps CI. Being in experimental makes it explicitly unsupported, so we can change or remove it if we feel like it at any point.

@sync

I've been looking at what causes deadlocks in our test suite in an effort to cut down on the number of failed tests on CI that result in hangs (since those are hard to diagnose and resolve). I found that by playing with various resource limits, it is easy to create hangs in the test suite. The reason we get a hang rather than a more easily diagnosable error is two fold. We either: 1. Aren't watching for the error (e.g. a socket remote end closing) 2. We aren't propagating the error to the top level A very common situation for case 2) is that the test is wrapped in @sync which doesn't return until all tasks have finished or error'ed. However, in many cases one of the tasks produces data for the others, so if that task errors, the remaining tasks will wait forever. This PR aims to address that situation by introducing a new `Experimental.@sync` macro that immediately rethrows any errors thrown by a contained task rather than waiting for all of them to finish. The implementation isn't super performant (it allocates a new task per object being waited on), but should be sufficient for use in the test suite. A better implementation would create a new scheduler object that can be inserted into multiple wait queues. Example usage of the new macro: ``` @sync begin @async error("Hello") @async sleep(1000) end # Waits 1000s Experimental.@sync begin @async error("Hello") @async sleep(1000) end # Throws immediately ``` The macro doesn't do any sort of cleanup for the tasks that do not finish, and just lets them run. In the future, we may want to automatically cancel those tasks, but that seemed like a bigger design problem than the simple thing that I wanted (something that propagates error messages more readily, so we see them in the logs).

@sync

I've been looking at what causes deadlocks in our test suite in an effort to cut down on the number of failed tests on CI that result in hangs (since those are hard to diagnose and resolve). I found that by playing with various resource limits, it is easy to create hangs in the test suite. The reason we get a hang rather than a more easily diagnosable error is two fold. We either: 1. Aren't watching for the error (e.g. a socket remote end closing) 2. We aren't propagating the error to the top level A very common situation for case 2) is that the test is wrapped in @sync which doesn't return until all tasks have finished or error'ed. However, in many cases one of the tasks produces data for the others, so if that task errors, the remaining tasks will wait forever. This PR aims to address that situation by introducing a new `Experimental.@sync` macro that immediately rethrows any errors thrown by a contained task rather than waiting for all of them to finish. The implementation isn't super performant (it allocates a new task per object being waited on), but should be sufficient for use in the test suite. A better implementation would create a new scheduler object that can be inserted into multiple wait queues. Example usage of the new macro: ``` @sync begin @async error("Hello") @async sleep(1000) end # Waits 1000s Experimental.@sync begin @async error("Hello") @async sleep(1000) end # Throws immediately ``` The macro doesn't do any sort of cleanup for the tasks that do not finish, and just lets them run. In the future, we may want to automatically cancel those tasks, but that seemed like a bigger design problem than the simple thing that I wanted (something that propagates error messages more readily, so we see them in the logs).

Keno requested review from vtjnash and JeffBezanson December 25, 2019 19:16

KristofferC mentioned this pull request Jan 5, 2020

Bug or misuse? Exceptions not propagating through multithreaded channels #34262

Closed

tkf reviewed Jan 5, 2020

View reviewed changes

Keno added the triage This should be discussed on a triage call label Feb 25, 2020

Keno removed the triage This should be discussed on a triage call label Feb 27, 2020

Keno added 4 commits March 12, 2020 16:11

Add syncany macro

10b93b4

Switch a bunch of sync invocations to syncany

3163a68

Make Sockets test more robust by using syncany

65449d2

syncany -> Base.Experimental.sync

5be6093

Keno force-pushed the kf/syncany branch from e54b4a8 to 5be6093 Compare March 12, 2020 20:34

Keno merged commit 558eec9 into master Mar 13, 2020

Keno deleted the kf/syncany branch March 13, 2020 02:19

IanButterworth mentioned this pull request May 24, 2023

at-sync macro is barely documented #49921

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A version of the sync macro that throws earlier #34198

A version of the sync macro that throws earlier #34198

Keno commented Dec 25, 2019

jonas-schulze commented Dec 25, 2019

tkf Jan 5, 2020

vtjnash commented Jan 5, 2020

Keno commented Jan 6, 2020

tkf commented Jan 6, 2020

c42f commented Jan 8, 2020

Keno commented Feb 27, 2020

tkf commented Feb 27, 2020

Keno commented Mar 13, 2020

A version of the sync macro that throws earlier #34198

A version of the sync macro that throws earlier #34198

Conversation

Keno commented Dec 25, 2019

jonas-schulze commented Dec 25, 2019

tkf Jan 5, 2020

Choose a reason for hiding this comment

vtjnash commented Jan 5, 2020

Keno commented Jan 6, 2020

tkf commented Jan 6, 2020

c42f commented Jan 8, 2020

Keno commented Feb 27, 2020

tkf commented Feb 27, 2020

Keno commented Mar 13, 2020