You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
MountainCarContinuous has some dependencies between time-series actions or states-actions.
using HAC(Hierarchical Actor Critic), this gap can be handled because all states(position and velocity) could be subgoals.
on the other hand, using UOF, the gap couldn't be handled because we cannot choose the optimal subgoals(we don't know exactly about the domain information of environments)
MountainCarContinuous has just one final goal and a small range of start position.
it makes UOF harder to converge. Since UOF just uses some subgoal(even we don't know exactly this goal is good or not), we cannot get various states and achieved goals for the hindsight framework.
MountainCarContinuous requires harder and continuous hierarchical reasoning.
In UOF paper, the algorithm was tested only block-stacking manipulation problem with various start positions and goal positions. In this environment, the agent's state is not affecting its behavior.
We cannot handle MountainCarContinuous only using discrete (and unknown) subgoals
Closing this issue,
we need to test on block-stacking manipulation problem.
UOF is not good enough for general hierarchical framwork.
to-do-list
Results
The text was updated successfully, but these errors were encountered: