[S-02-2] UOF + DDPG Implementation #11

CUN-bjy · 2021-08-25T14:51:02Z

CUN-bjy · 2021-10-04T04:09:51Z

Discussion

MountainCarContinuous has some dependencies between time-series actions or states-actions.

using HAC(Hierarchical Actor Critic), this gap can be handled because all states(position and velocity) could be subgoals.
on the other hand, using UOF, the gap couldn't be handled because we cannot choose the optimal subgoals(we don't know exactly about the domain information of environments)

MountainCarContinuous has just one final goal and a small range of start position.

it makes UOF harder to converge. Since UOF just uses some subgoal(even we don't know exactly this goal is good or not), we cannot get various states and achieved goals for the hindsight framework.

In UOF paper, the algorithm was tested only block-stacking manipulation problem with various start positions and goal positions. In this environment, the agent's state is not affecting its behavior.
We cannot handle MountainCarContinuous only using discrete (and unknown) subgoals

Closing this issue,

CUN-bjy mentioned this issue Aug 25, 2021

[S-02] Multi-step(hierarchical) and Multi-goal tasks on Panda #8

Open

9 tasks

CUN-bjy changed the title ~~-> UOF + DDPG Implementation~~ UOF + DDPG Implementation Aug 28, 2021

CUN-bjy changed the title ~~UOF + DDPG Implementation~~ [S-02-2] UOF + DDPG Implementation Aug 28, 2021

CUN-bjy added this to the Stage 2 milestone Aug 28, 2021

CUN-bjy self-assigned this Sep 1, 2021

CUN-bjy added the enhancement New feature or request label Sep 1, 2021

CUN-bjy closed this as completed Oct 4, 2021