You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What they have in common is that users want to know how they can handle exceptions from self-written components at the caller level of the statemachine. They often ask why exceptions get caught inside the machine but are not passed to the outside.
It seems there are 2 principles that try to explain why Spring Statemachine catches exceptions extensively:
While one could argue that number 2 is merely a requirement derived from number 1 the "Run To Completion" actually refers only to how events should be processed. The "one by one" approach as explained on the linked sites is important because the machine needs to be in a well-defined and stable state before it can go on.
However, this does not take exceptions or errors into account. From my personal point of view I doubt these are even considered events (at least not the ones that occur unexpectedly). That is why I believe when it comes to exception handling we should move away from the very fundamental RTC paradigm and take a look at how users can operate a statemachine meaningfully.
Room For Improvement
In the following I will refer to exceptions in actions or guards but most of the points can be applied to other user-written components as well (e.g. listeners, interceptors).
The current (3.2.0) implementation has at least 2 shortcomings that force the implementer to take extra measures against exceptions:
Exceptions are not always propagated to the caller starting the statemachine or sending an event.
As a consequence, callers must choose a container (e.g. extended state) to store an exception that happens during statemachine execution. This container must be accessible from outside the machine so that one can read and evaluate the exception after execution.
Exceptions can make a statemachine impossible to reuse. E.g. when they occur in an action bound to a triggerless transition the machine literally hangs in the transit state where the transition originates.
Users that wish to reuse the same statemachine instance between events must extend their exception handling by one of the following:
Add a looping event transition to the transit state, i.e. one that leads to the same state again. Once an exception occurs the event has to be sent to continue on the former path since triggerless transitions get executed only once a state is entered.
Create a machine backup, e.g. with the help of Spring StateMachinePersister. This can be used to restore the statemachine from a stable state. In technical terms this means the machine experiences a reset.
Exception Handling Test
To demonstrate these points I created test based on the following statemachine:
S1 .. S5 := states
S1 := initial state
S2 := choice
S3 := event accepting state
S4 := transit state (with state entry + behavior + exit action)
S5 := end state
E := event
a := action
g := guard
The action is always registered with an errorAction that writes the exception into a container, specifically the ExtendedState. The guard is registered using a wrapper around it that provides the same exception handling.
The last column states whether the statemachine can be reused for transition execution. In case of a result state with only a triggerless transition or the end state the machine is stopped, reset and started again to see if its state changes. In case of S3 that accepts an event another event is sent with no exception to see if the transition is taken.
Test Project
Attached. Built with Java 17.0.6 and Gradle 8.1.1.
Note that test cases were written not to fail at issues listed below. Instead, a comment was added to assertions proving an error.
Test Report
Item #
Path To Exception
Exception Origin
Exception Type
Exception Propagated?
Exception In Container?
Result State
SSM Reusable?
1
start → S1
initial action
RuntimeException
✓
✓
✗
✗
2
start → S1
initial action
Error
✓
✗
✗
✗
3
start → S1 → S2
S1 → S2 action
RuntimeException
✓
✓
S1
✗
4
start → S1 → S2
S1 → S2 action
Error
✓
✗
S1
✗
5
start → S1 → S2
S2 option 1 guard
RuntimeException
✗
✓
S5
✗
6
start → S1 → S2
S2 option 1 guard
Error
✗
✓
S5
✗
7
start → S1 → S2
S2 option 1 action
RuntimeException
✓
✓
S1
✗
8
start → S1 → S2
S2 option 1 action
Error
✓
✗
S1
✗
9
start → S1 → S2
S2 option guards + default option action
RuntimeException
only action exception
✓
S1
✗
10
start → S1 → S2
S2 option guards + default option action
Error
only action error
only guard errors
S1
✗
11
start → S1 → S2 → S3 → S4
S3 → S4 action
RuntimeException
✗
✓
S3
✓
12
start → S1 → S2 → S3 → S4
S3 → S4 action
Error
✓
✗
S3
✗
13
start → S1 → S2 → S3 → S4
S3 → S4 guard
RuntimeException
✗
✓
S3
✓
14
start → S1 → S2 → S3 → S4
S3 → S4 guard
Error
✓
✓
S3
✗
15
start → S1 → S2 → S4
S4 state entry + behavior + exit action
RuntimeException
✗
missing or extra exit action exception
S4 or S5
✗
16
start → S1 → S2 → S4
S4 state entry action
Error
✓
✗
S4
✗
17
start → S1 → S2 → S4
S4 state behavior action
Error
✗
✗
S5
✗
18
start → S1 → S2 → S4
S4 state exit action
Error
only sometimes
✗
S4 or S5
✗
Test Result Groups
Error Gets Propagated
Test cases in which an error gets propagated to the caller can be considered OK in my opinion. The machine cannot be reused but since we experienced an error this is probably not what we want anyway. This applies to test items 2,4,8,12,14 and 16.
Error Not Propagated
This was demonstrated in case of type Error in choice option guards in 6, and 10. Note that in 10 this allows an error from an action to slip through. The machine also continues transition execution and enters the end state. The same applies to item 17 where the error occurs in a state behavior action. The expected behavior here would be to terminate execution right away.
Statemachine Gets Caught In State After Exception
In case of type Exception we may want to reuse the machine to start it again or re-send an event because the nature of the exception might be temporary. This will not be possible if the result state of the statemachine does not allow for that. It was described earlier in "Room For Improvement - Point 2" from above and applies to test items 1,3,5,7,9 and 15.
Transition Execution Not Interrupted After Exception
Some test items demonstrate that the statemachine continues its transition logic despite an exception occurred. This applies to the choice option guards as seen in test items 5 and 6 as well as to state actions from 15. A more severe case is test item 17 where despite an error in S4 behavior action the end state is entered.
Flaky Runs
Random erroneous behavior was experienced in test items 15 and 18 where the exit action from S4 fires. Sometimes the action is late meaning at the time of verification it has not been executed yet. You may modify the tests to make the thread wait for another second before mock verification to see that it does finally execute. There are other times when the same action executes twice. Possibly related to:
The same happened in test items 11 and 13 when the event was sent a 2nd time (without exception).
What's more, the exception propagated to the caller is not always what we would expect:
java.util.ConcurrentModificationException
at java.base/java.util.ArrayList$Itr.checkForComodification(ArrayList.java:1013)
at java.base/java.util.ArrayList$Itr.next(ArrayList.java:967)
at reactor.core.publisher.FluxIterable$IterableSubscription.slowPath(FluxIterable.java:259)
[...100 more...]
at reactor.core.publisher.FluxGenerate$GenerateSubscription.next(FluxGenerate.java:178)
at org.springframework.statemachine.support.ReactiveStateMachineExecutor.lambda$handleTriggerlessTransitions$18(ReactiveStateMachineExecutor.java:349)
at reactor.core.publisher.FluxGenerate.lambda$new$1(FluxGenerate.java:58)
[...100 miles down the reactor...]
at reactor.core.publisher.MonoIgnoreThen.subscribe(MonoIgnoreThen.java:51)
at reactor.core.publisher.Mono.subscribe(Mono.java:4400)
at reactor.core.publisher.Mono.block(Mono.java:1706)
at org.springframework.statemachine.support.LifecycleObjectSupport.start(LifecycleObjectSupport.java:111)
This could be related to an issue that was meant to be fixed:
Catch an exception, interrupt transition execution, and rethrow the exception. Let it exit the machine so that operators can try-catch.
P-2: Recover The Statemachine
This could be achieved by a Back To Origin approach where the machine is reset to:
the pre-start state if the machine was started
the pre-event state if an event was sent
State in this context refers to any part of the statemachine including extended state, current error, etc.
This could make sense as a general feature but may be optional as well: configurer.withRecovery( ). E.g. when a statemachine is persisted in a database (via parts of itsStateMachineContext) and calling threads only query the machine to restore it, send a single event, evaluate success and then persist it again, one will not need statemachine recovery in case of an exception. In essence, there might be users who want to reuse the machine for several events and others do not.
P-3: Keep Up Development
Based on the number of issues that have piled up and reasonable doubt that has been expressed:
I guess a lot of users would be happy to see progress on but not limited to this topic. One way to start would be to keep up communication with those involved in issues.
The text was updated successfully, but these errors were encountered:
Motivation
Some issues related to exception handling have been reported over time:
What they have in common is that users want to know how they can handle exceptions from self-written components at the caller level of the statemachine. They often ask why exceptions get caught inside the machine but are not passed to the outside.
It seems there are 2 principles that try to explain why Spring Statemachine catches exceptions extensively:
While one could argue that number 2 is merely a requirement derived from number 1 the "Run To Completion" actually refers only to how events should be processed. The "one by one" approach as explained on the linked sites is important because the machine needs to be in a well-defined and stable state before it can go on.
However, this does not take exceptions or errors into account. From my personal point of view I doubt these are even considered events (at least not the ones that occur unexpectedly). That is why I believe when it comes to exception handling we should move away from the very fundamental RTC paradigm and take a look at how users can operate a statemachine meaningfully.
Room For Improvement
In the following I will refer to exceptions in actions or guards but most of the points can be applied to other user-written components as well (e.g. listeners, interceptors).
The current (3.2.0) implementation has at least 2 shortcomings that force the implementer to take extra measures against exceptions:
StateMachinePersister
. This can be used to restore the statemachine from a stable state. In technical terms this means the machine experiences a reset.Exception Handling Test
To demonstrate these points I created test based on the following statemachine:
The action is always registered with an
errorAction
that writes the exception into a container, specifically theExtendedState
. The guard is registered using a wrapper around it that provides the same exception handling.The last column states whether the statemachine can be reused for transition execution. In case of a result state with only a triggerless transition or the end state the machine is stopped, reset and started again to see if its state changes. In case of S3 that accepts an event another event is sent with no exception to see if the transition is taken.
Test Project
Attached. Built with Java 17.0.6 and Gradle 8.1.1.
statemachine-exception-handling.zip
Note that test cases were written not to fail at issues listed below. Instead, a comment was added to assertions proving an error.
Test Report
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
RuntimeException
Error
Error
Error
Test Result Groups
Error Gets Propagated
Test cases in which an error gets propagated to the caller can be considered OK in my opinion. The machine cannot be reused but since we experienced an error this is probably not what we want anyway. This applies to test items 2,4,8,12,14 and 16.
Error Not Propagated
This was demonstrated in case of type
Error
in choice option guards in 6, and 10. Note that in 10 this allows an error from an action to slip through. The machine also continues transition execution and enters the end state. The same applies to item 17 where the error occurs in a state behavior action. The expected behavior here would be to terminate execution right away.Statemachine Gets Caught In State After Exception
In case of type
Exception
we may want to reuse the machine to start it again or re-send an event because the nature of the exception might be temporary. This will not be possible if the result state of the statemachine does not allow for that. It was described earlier in "Room For Improvement - Point 2" from above and applies to test items 1,3,5,7,9 and 15.Transition Execution Not Interrupted After Exception
Some test items demonstrate that the statemachine continues its transition logic despite an exception occurred. This applies to the choice option guards as seen in test items 5 and 6 as well as to state actions from 15. A more severe case is test item 17 where despite an error in S4 behavior action the end state is entered.
Flaky Runs
Random erroneous behavior was experienced in test items 15 and 18 where the exit action from S4 fires. Sometimes the action is late meaning at the time of verification it has not been executed yet. You may modify the tests to make the thread wait for another second before mock verification to see that it does finally execute. There are other times when the same action executes twice. Possibly related to:
The same happened in test items 11 and 13 when the event was sent a 2nd time (without exception).
What's more, the exception propagated to the caller is not always what we would expect:
This could be related to an issue that was meant to be fixed:
Improvement Proposals
P-1: Propagate Exceptions
Catch an exception, interrupt transition execution, and rethrow the exception. Let it exit the machine so that operators can try-catch.
P-2: Recover The Statemachine
This could be achieved by a Back To Origin approach where the machine is reset to:
State in this context refers to any part of the statemachine including extended state, current error, etc.
This could make sense as a general feature but may be optional as well:
configurer.withRecovery( )
. E.g. when a statemachine is persisted in a database (via parts of itsStateMachineContext
) and calling threads only query the machine to restore it, send a single event, evaluate success and then persist it again, one will not need statemachine recovery in case of an exception. In essence, there might be users who want to reuse the machine for several events and others do not.P-3: Keep Up Development
Based on the number of issues that have piled up and reasonable doubt that has been expressed:
I guess a lot of users would be happy to see progress on but not limited to this topic. One way to start would be to keep up communication with those involved in issues.
The text was updated successfully, but these errors were encountered: