This project implement an imitation from observation algorithm. The algorithm trains an agent to learn to imitate an expert performing some tasks, by using videos of this expert.
We evaluate this algorithm on DeepMind Control tasks by training different experts and trying to imitate these experts. To evaluate our model:
- We train an expert on a specific task:
We use DrQv2 algorithm, a model-free RL algorithm, to train an expert on a DeepMind Control task. This expert allows us to build a dataset of demonstrations showing how to do the task in many source contexts.
Demo of experts trained on Finger Spin, Finger Turn and Reacher tasks
- We train a context translation model using videos of the trained expert
The Context translation model which takes as input a demonstration of an expert in a source context, the first observation of the imitator agent in the target context, and outputs a predicted sequence of next observations in this target context. We train this model with this Imitation from Observation algorithm.
Example of context translation from a source context into a target context. The first row is the sequence of expert observations, the second row is the predicted sequence of agent observation.
- We train an imitator agent to reproduce the different states predicted by the context translator using a classic Actor-Critic algorithm.
- Install the environment
conda env create -f env.yml
conda activate ifo
For the following we consider only the task Reacher Hard, but this process can be applied with any other tasks.
- Train the expert
python train.py task=reacher_hard
- Watch evaluation videos in the
eval
folder of the experiment folder - Watch the training on Tensorboard
tensorboard --logdir exp_local
- Copy-paste the
snapshot.pt
file from the experiment folderexp_local
into theexperts
folder (create it in the root if it doesn't exist) and name itreacher_hard.pt
- Generate demonstrations of the expert acting in many random contexts
python generate_reacher_hard_expert_video.py
The demonstration dataset is stored in videos/reacher_hard
and split into train
and valid
datasets.
- Train the context translation model on the Reacher Hard expert videos
python train_ct.py task=reacher_hard
- Watch evaluation videos in the
eval
folder of the experiment folder - Watch the training on Tensorboard
tensorboard --logdir ct_local
- Copy-paste the
snapshot.pt
file from the experiment folderct_exp_local
into thect
folder (create it in the root if it doesn't exist) and name itreacher_hard.pt
- Train the imitator agent by using the expert as demonstration video provider and the context translation model as context translator
python train_rl.py task=reacher_hard
- Watch evaluation videos in the
eval
folder of the experiment folder - Watch the training on Tensorboard
tensorboard --logdir rl_local
- We reuse Denis Yarats's code of the DrQv2 project for this project