Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KeyError in ActionServer._execute_goal #1236

Open
rafael1193 opened this issue Feb 29, 2024 · 8 comments · May be fixed by #1363
Open

KeyError in ActionServer._execute_goal #1236

rafael1193 opened this issue Feb 29, 2024 · 8 comments · May be fixed by #1363

Comments

@rafael1193
Copy link

Bug report

Required Info:

  • Operating System:
    • Ubuntu 22.04
  • Installation type:
    • Binaries (.deb package)
  • Version or commit hash:
    • 3.3.12-1jammy.20240217.065750
  • DDS implementation:
    • Fast-RTPS (2.6.7-1jammy.20240217.034748)
  • Client library (if applicable):
    • rclpy

Steps to reproduce issue

  • Run an action server node with the use_sim_time parameter as True
  • Have a node emit clock messages faster than wall clock moves (at x60)
  • Call the action

Expected behavior

Action goal result is set without problem.

Actual behavior

The action server fails because self._result_futures[bytes(goal_uuid)].set_result(result_response) in the ActionServer _execute_goal method raises a KeyError exception:

Traceback (most recent call last):
  File "/home/rbailonr/adauv_ws/install/adauv_bluebox/lib/adauv_bluebox/vehicle.py", line 333, in main
    executor.spin()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 294, in spin
    self.spin_once()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 794, in spin_once
    self._spin_once_impl(timeout_sec)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/executors.py", line 791, in _spin_once_impl
    future.result()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 94, in result
    raise self.exception()
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/task.py", line 239, in __call__
    self._handler.send(None)
  File "/opt/ros/humble/local/lib/python3.10/dist-packages/rclpy/action/server.py", line 352, in _execute_goal
    self._result_futures[bytes(goal_uuid)].set_result(result_response)
KeyError: b'EM$\xc6\xb3YA:\xa9\xa1_\xd1\xeb\x88\x06\x88'

Additional information

This occurrence of this behavior is inconsistent, but only happens in accelerated time.

It looks like a synchronization issue with ActionServer._execute_expire_goals that removes the goal result future from the _result_futures dictionary before _execute_goal sets its result. Checking not self._result_futures[goal_uuid].done() before removing the expired goal seems to mitigate the issue, but the actual source of the problem may be elsewhere.

@Barry-Xu-2018
Copy link
Contributor

It should not be a bug.
There is a Result caching in action design (Please refer to https://design.ros2.org/articles/actions.html).
Users can decide how long the goal result will be retained. However, it is not guaranteed that the goal result will be consistently retained even if it has not been accessed.

result_timeout=900

@fujitatomoya
Copy link
Collaborator

Have a node emit clock messages faster than wall clock moves (at x60)

@Barry-Xu-2018 this will make the result expired (removed) 60 times faster in the action server.
i think precisely that this situation could happen, but generating KeyError exception is not designed behavior?

i think we can check if the key bytes(goal_uuid) exists in _result_futures dictionary, and, goal_handle is also remove from _goal_handles. probably it can print warning that says "Goal is already expired before completion", what do you think?

@Barry-Xu-2018
Copy link
Contributor

i think precisely that this situation could happen, but generating KeyError exception is not designed behavior?

Yes. Throwing an exception is not a correct behavior in this scenario.

i think we can check if the key bytes(goal_uuid) exists in _result_futures dictionary, and, goal_handle is also remove from _goal_handles. probably it can print warning that says "Goal is already expired before completion", what do you think?

Yes. I'll double-check the code logic to see if there are any other places that may need modification when this issue occurs.

@Barry-Xu-2018
Copy link
Contributor

Barry-Xu-2018 commented Mar 1, 2024

async def _execute_goal(self, execute_callback, goal_handle):
goal_uuid = goal_handle.goal_id.uuid
self._logger.debug('Executing goal with ID {0}'.format(goal_uuid))
try:
# Execute user callback
execute_result = await await_or_execute(execute_callback, goal_handle)
except Exception as ex:
# Create an empty result so that we can still send a response to the client
execute_result = self._action_type.Result()
self._logger.error('Error raised in execute callback: {0}'.format(ex))
traceback.print_exc()
# If user did not trigger a terminal state, assume aborted
if goal_handle.is_active:
self._logger.warning(
'Goal state not set, assuming aborted. Goal ID: {0}'.format(goal_uuid))
goal_handle.abort()
self._logger.debug(
'Goal with ID {0} finished with state {1}'.format(goal_uuid, goal_handle.status))
# Set result
result_response = self._action_type.Impl.GetResultService.Response()
result_response.status = goal_handle.status
result_response.result = execute_result
self._result_futures[bytes(goal_uuid)].set_result(result_response)

async def _execute_expire_goals(self, expired_goals):
for goal in expired_goals:
goal_uuid = bytes(goal.goal_id.uuid)
self._goal_handles[goal_uuid].destroy()
del self._goal_handles[goal_uuid]
self.remove_future(self._result_futures[goal_uuid])
del self._result_futures[goal_uuid]

These two parts involve asynchronous operations, so to completely avoid the issue, locking (asyncio.Lock) should be added when operating on self._result_futures.

@rafael1193
Copy link
Author

Thank you very much for your help.

In hindsight, some expiring timeout message was to be expected but not the action server failing. I have increased result_timeout accordingly and the synchronization problem does not seem to appear anymore.

@fujitatomoya
Copy link
Collaborator

@Barry-Xu-2018 i am almost sure this has been fixed with ros2/rcl#1121, what do you think?

i think the only thing missing here is to check the goal_uuid key in _result_futures during _execute_get_result_request call. this could generate the key error when the result timeout is too short or simulation time like this issue report.

what do you think?

@fujitatomoya
Copy link
Collaborator

i think the only thing missing here is to check the goal_uuid key in _result_futures during _execute_get_result_request call.

I was thinking that the following patch, but i found that is not needed. because it already checks the goal_uuid in the _goal_handles. probably we can adjust the debug message a bit.

# If no goal with the requested ID exists, then return UNKNOWN status
if bytes(goal_uuid) not in self._goal_handles:
self._logger.debug(
'Sending result response for unknown goal ID: {0}'.format(goal_uuid))
result_response = self._action_type.Impl.GetResultService.Response()
result_response.status = GoalStatus.STATUS_UNKNOWN
self._handle.send_result_response(request_header, result_response)
return

diff --git a/rclpy/rclpy/action/server.py b/rclpy/rclpy/action/server.py
index ce60309..06b93d4 100644
--- a/rclpy/rclpy/action/server.py
+++ b/rclpy/rclpy/action/server.py
@@ -472,8 +472,13 @@ class ActionServer(Generic[GoalT, ResultT, FeedbackT], Waitable['ServerGoalHandl

         # There is an accepted goal matching the goal ID, register a callback to send the
         # response as soon as it's ready
-        self._result_futures[bytes(goal_uuid)].add_done_callback(
-            functools.partial(self._send_result_response, request_header))
+        if bytes(goal_uuid) in self._result_futures:
+            self._result_futures[bytes(goal_uuid)].add_done_callback(
+                functools.partial(self._send_result_response, request_header))
+        else:
+            # This can happen when result_timeout is too short, or high-freq simulation time
+            self._logger.warn(
+                'Requested result has been expired already (goal ID: {0})'.format(goal_uuid))

     async def _execute_expire_goals(self, expired_goals: Tuple[GoalInfo, ...]) -> None:
         for goal in expired_goals:

@Barry-Xu-2018
Copy link
Contributor

@Barry-Xu-2018 i am almost sure this has been fixed with ros2/rcl#1121, what do you think?

If result_timeout is still set to 0, this issue might still occur. This is because the current design doesn't ensure that the result is retrieved before it gets deleted. So it's the user's responsibility to set a reasonable result timeout.

ros2/rcl#1121 isn't aimed at this issue.

i think the only thing missing here is to check the goal_uuid key in _result_futures during _execute_get_result_request call. this could generate the key error when the result timeout is too short or simulation time like this issue report.

what do you think?

Yes. This check is necessary.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants