-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem in the constructed BT structure and execution #211
Comments
Hi @roveri-marco. This sounds similar to some issues I've seen in the past with incorrect parallel actions. I recently made some modifications to the Plan-to-Action Graph portion of the BTBuilder, which I think should have helped to correct this. See #206. Have you pulled the latest version of the code? I tested the example you provided on my machine and I could not replicate the error. If your code is up to date and you're still seeing this error, I will look into this further next week to double check there are no errors in the recent modifications.
|
@jjzapf I'm no longer able to reproduce it, although I was in the master and I've not changed anything... However, if I change the domain as in the
It first executes the
I believe, looking at the Last, in some more complex application we are building on top of plansys2 we face several random crash with very limited information on the reason some of the nodes crashes... it seems some shared pointer die, but it is very hard to debug. @jjzapf @fmrico can you suggest me some approach to debug this random crashes? |
When computing the root set, we look for actions that can be run in parallel. To test whether 2 actions can be run in parallel, we first test whether each action is executable given the current state. We then check whether the effects of one action will contradict the requirements of the other. For example, in a move action you might have a (at start (not (robot_at r loc))) effect that would contradict a (over all (robot_at r loc)) requirement. Since we do not know precisely how these actions will overlap, it seemed good to assume that they can overlap in any way possible or even be ran in sequence. Hence, we also check the requirements of an action against the "at end" effects. See code link below. Unfortunately, this logic fails in the case you provided where an at end effect contradicts an at start requirement. Clearly PDDL does allow for this and 2 actions should be allowed to run in parallel so long as their at start and over all requirements/effects do not lead to contradictions. I tested removing the checks for the "at end" effects and this does seem to help the code to progress further. However, I still ran into some more errors later down the pipe. I will have time early next week to dig into this further and will hopefully come up with a fix. Thanks for finding this error and providing a nice example for debugging! |
@jjzapf Thanks for the explanation. I look forward to testing it. The other problem that we are facing is the random crash of some action nodes and/or of the executor. |
I was thinking about fleshing out the description I made in #206 into a more complete document. That description covers the plan-to-action graph part of the code/algorithm. I tried to describe as best I could the previous approach as well as the modifications I made. I agree that it is crucial to get this part of the code correct and document the resulting approach. We should definitely update the code/website documentation. I'm also very open to working on a future publication with you and @fmrico. I haven’t been able to look into the random crashes yet and don’t have any ideas off the top of my head. This week has been pretty busy for me, but I hope to dig into these issues in the coming week. |
@roveri-marco, I opened PR #216, which fixes some of the issues discussed above. In particular, it fixes the bad logic regarding the checks for effect/condition contradictions when determining if 2 actions can run in parallel. It also fixes a bug that I found in the prune_backwards function in the BTBuilder. (See description in #216 for more details.) Here is the action graph that I am getting for your example after applying the changes in #216. Here is the behavior tree that I am getting for your example after applying the changes in #216. Note that the red boxes represent the WaitAction sets. Unfortunately, I am still getting an error with your example. I added some additional print statements to see what was going on.
It seems that an "over all" condition of (load r h c4 depot n3 empty) is being checked after an "at start" effect of (move h r depot l1) is being applied. However, (load r h c4 depot n3 empty) should finish before (move h r depot l1) starts. So I am not sure what is going on here. As far as I can tell, this error is not do to the action graph formation, but rather occurs somewhere later in the code. I will try to dig into this a bit more and open another PR if I find a solution. |
This would be great. Maybe we could try to define a theoretical framework to prove the correctness of the current algorithm 🤔 We could send it again to AAMAS or maybe try in ICAPS. Next week, when I return from holidays, I could start to think about it. Any comments or ideas are welcome. @jjzapf I have just merged #216 , but I read that the bug is not still fixed. Should I reopen again? |
I'll be happy to work on the theoretical framework and publication with you as well. I'll be on vacation the 1st half of next week, but I should be available starting Wednesday or Thursday. @fmrico, thanks for merging #216. You don't need to reopen it. That PR focused on a bug related to the action graph formulation. I believe we are now getting the correct action graph for the example @roveri-marco provided. I think the unfixed bug has to do with the behavior tree execution. So I believe that should be handled in a separate PR. I'm not exactly sure what's going on, but I'll try to dig into it a bit more. |
If we modify the if statement in this line to this
then the example @roveri-marco provided will finish successfully. I think the issue is that we are using a ReactiveSequence to handle this part of the behavior tree. From the BehaviorTree.CPP documentation, I found this warning concerning the use of a ReactiveSequence. This node is particularly useful to continuously check Conditions; but the user should also be careful when using asynchronous children, to be sure that they are not ticked more often that expected. By adding some print statements, I was able to discover that an "over all" requirements check for the 2nd "load" action was occurring after the "load" action had finished as well as after the "at start" effect of the "move" action had been applied. This is a problem because the "at start" effects of the "move" action conflict with the "over all" conditions of the "load" action. Thus, to fix this I added a check to the CheckOverAllReq code to see if the action had finished before checking the requirements and possibly triggering a failure. Unfortunately, this change breaks some of our unit tests. I will try to see if it is possible to fix these. One other thing that I noticed is that even though the first 2 actions should run in parallel, they seem to be running one after the other. This could be do to the fact that they are of the same action type "load". I noticed that @roveri-marco created 2 "load" action executor clients. However, I'm not sure if the system knows that it can use the 2nd action executor client if the first is busy. This is just a guess. @fmrico do you have any insight as to how the system would handle running 2 actions of the same type at the same time? |
The action auction mechanism should be able to handle it (at least from documentation), but I'm not sure it is handled internally. |
I think I found a better solution to the issue I discussed in my previous comment. See PR #217 for details. The short story is that I don't think the WaitAction node was waiting for the entire action sub-tree to finish, just the ExecuteAction node. Thus an "at start" effect of a successor action was allowed to occur before the predecessor action was really finished. |
If there is more than one action performer for one action, there should not be any problem. If there is only one, it will wait until one is available. |
I've created a simple example with three durative actions
move
deliver
andload
. I've added a mutual exclusion predicate `(nobusy ?r)'The model does not contain any numeric fluent but only predicates. I can successfully run the launcher and the terminal
ros2 launch pddl_project_problem_4 pddl_project_problem_4_launch.py
on one terminalros2 run plansys2_terminal plansys2_terminal
on the second terminalThen on the terminal I issue the commands specified in the command files in attach, and then I ask to generate a plan and to run it.
The launcher terminates with this error
I believe that the BT constructed is buggy since it tries to execute the (deliver r h p1 c1 food l1 n2 n3) and the (move h r l1 depot) together, while the
move
shall wait for thedeliver
to terminate. Thus, the overall conditions of thedeliver
are not met and the execution fails. If you comment out the lines that also force the move to use the(nobusy ?r)
predicate, then the execution succeeds since it correctly waits for thedeliver
to terminate before starting the move.If you comment the lines in the
domain.pddl
file where the(nobusy ?r)
predicate is used, then other errors appear, although the plan is generated correctly (the execution fails).The files to reproduce the errors:
pddl_project_problem_4-buggy.zip
commands.txt
The text was updated successfully, but these errors were encountered: