Initial supervision tests #43

goodboy · 2018-11-22T16:31:31Z

#42 brings in proper trio.MultiError support and more deterministic cancellation semantics.
More tests are needed to ensure this system is rock solid before moving onto adding different supervision strategies as in #22.

Some I can think of off-hand that aren't in the test suite are:

n run_in_actor(), x start_actor()
- local error causes all to cancel
  - n don't error but complete quickly
  - n don't error but complete slowly
  - n all error
- internal error in start_actor() and run_in_actor()
  - n all error
  - n some error
propagation of MultiError up subactor nursery trees

More to come...

The text was updated successfully, but these errors were encountered:

goodboy · 2019-10-22T01:26:28Z

Hmm here's another weird case I thought of:

A start_actor() (daemon actor) errors on a call but we're stuck already blocking on some long running run_in_actor() - Portal.result() call. What should happen? The task result waiter task won't know about the daemon actor's failure because that portal hasn't been "checkpointed" but should the nursery/supervisor cancel the actor group regardless?

Wait, just thinking, nm. If you make a call to a daemon actor the only way this could happen is if you were to stop iterating a remote streaming function (any other type of call should error on the Portal.run() checkpoint), so you'd have to stop iterating some Portal.run() call to a start_actor() actor (aka daemon actor) mid-iteration. But, if you did that from an async for you'd (hopefully, assuming you're aware of the precariousness involved - #57) trigger the cross-process task cancellation machinery (remember aclose() needs to be called on an async for stream break). This whole situation could, however, happen more naturally using the context api (which now in hindsight probably also needs a context manager system wrapping it much like an async_generator.aclosing() to ensure proper cross process stream termination).

Probably needs tests for all of this right now lol!

See #87 for further discussion on how this might be solved. Either way we need tests to verify the current behavior.

In an effort towards #43. This completes the first major "bullets" worth of tests described in that issue.

In an effort towards #43. This completes the first major bullet's worth of tests described in that issue.

goodboy · 2019-10-26T19:03:10Z

Aha! Catching more undefined behavior with this endevour :D

Working on the nested MultiError test case I figured out that we don't have rules in place for what happens if a nursery is waited when one actor has errored but not all have yet spawned (since spawning is async).

I've created #88 to address this.

This exemplifies the undefined behaviour in #88 and begins to test for the last bullet in #43.

Add a test to verify that `trio.MultiError`s are properly propagated up a simple actor nursery tree. We don't have any exception marshalling between processes (yet) so we can't validate much more then a simple 2-depth tree. This satisfies the final bullet in #43. Note I've limited the number of subactors per layer to around 5 since any more then this seems to break the `multiprocessing` forkserver; zombie subprocesses seem to be blocking teardown somehow... Also add a single depth fast fail test just to verify that it's the nested spawning that triggers this forkserver bug.

goodboy mentioned this issue Nov 22, 2018

Document MultiError support #44

Open

goodboy added this to the 0.1.0.a0 milestone Oct 17, 2019

goodboy added a commit that referenced this issue Oct 25, 2019

Extend cancellation tests

17b16bc

In an effort towards #43. This completes the first major "bullets" worth of tests described in that issue.

goodboy added a commit that referenced this issue Oct 26, 2019

Extend cancellation tests

6dbb3f7

In an effort towards #43. This completes the first major bullet's worth of tests described in that issue.

goodboy mentioned this issue Oct 26, 2019

Nursery Behavior: an error/cancel during spawn? #88

Closed

goodboy added a commit that referenced this issue Oct 26, 2019

Add a preliminary nested subactor MultiError test

d406383

This exemplifies the undefined behaviour in #88 and begins to test for the last bullet in #43.

goodboy changed the title ~~More thorough supervision tests~~ Initial supervision tests Oct 30, 2019

goodboy mentioned this issue Oct 30, 2019

N-depth nursery error propagation tests #89

Open

goodboy mentioned this issue Nov 22, 2019

More thorough basic supervision tests #91

Merged

goodboy closed this as completed in #91 Nov 23, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Initial supervision tests #43

Initial supervision tests #43

goodboy commented Nov 22, 2018

goodboy commented Oct 22, 2019 •

edited

Loading

goodboy commented Oct 26, 2019

Initial supervision tests #43

Initial supervision tests #43

Comments

goodboy commented Nov 22, 2018

goodboy commented Oct 22, 2019 • edited Loading

goodboy commented Oct 26, 2019

goodboy commented Oct 22, 2019 •

edited

Loading