This Section discribes the changes made to REVEIRE and gives an explantaiton of the flow of the code. Below is the ReadMe for installing and using REVEIRE.
All of the main parts of the code that I had to change have a copy i.e: follower.py -> follower_cp.py
First, here is a breakdown of how the code progresses:
TrainFast.py is the starting function. Here our environment and agent are made using Env.py class R2RBatch and Follower.py Seq2Seq Class. From there we enter the train function in TrainFast. This primarily uses the follower.py script and runs till completion (there are 10466 instructions and we use batches of 64. The code saves every 100 iterations so I have it run 16400 times so we save after each batch. Each PPO learn step k+ is computed after a batch )
An explanation of the main parts and changes is as follows:
- Follower.py Edit Overview: This script hosts the Seq2Seq class which creates our batches of agents. The function agent.train(..) in TrainFast call train->rollout->rollout_with_loss method in Follower. Rollout_with_loss uses our 64 batch of agents to run through experiments with 64 instruction sets. Each agent/instruction has an episode length of 20 max. I set this value in TrainFast, it was originally set to 10. The max shortest path length for all instructions is under 10. 20 gives the RL env more time to explore and more state-action-reward data to use for training.
- More Follower Edits: Along with above, the main edits to follower consist of me by saving obs, log_probs, actions, for all 64 agents during one training loop. This gives me at most 64x20 set of obs,actions,ect. I use this set to run ppo learn and update weights in the decoder. AgentMemory in follower.py is the class I created to use PPO with my agents.
- AgentMemory in Follower: The trickiest part of this code is storing history and generating mixed batches. The code is commented but I store the history sequentially so I can compute the advantages. Since the original code uses batches, each agent is not guaranteed to have the same number of possible acitons at time step t. The orignial code pads to accoutn for this. I store the history and sort obs based on action lenght to all mini batches.
- TrainFast.py is explained above. The primary changes have to do with the number of iterations. I changed the max number of iterations to 16400 to match the number of instructions/ batch size times iteration saving(100), so we don't repeat examples.
- Env.py: This script host the R2RBatch class which creates our env and handels batches of obs. The main change here is in the method '_next_minibatch'. Check the comments in the '_next_minibatch'
- ModelFast: This script host the decoder model and all other models. The decoder used is CogroundDecoderLSTM. The main changes I make here are the creation of a value network with in the decoder. This is possible flawed. The state space of the Actor and Citic might be different, check comments above CogroundDecoderLSTM. The computational graph needs to be traced more.
Final Comments: Final push for Reverie PPO Project. Implementaiton should be correct but the model did not learn. State space choice may be wrong, it is possible it is overfitting, or there is some erroneous thing I have not found. ""
🌟 Results of The 1st REVERIE Challenge on ACL Workshop 2020! More details see here.
Here are the pre-released code and data for the CVPR 2020 paper REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments
As shown in the above figure, a robot agent is given a natural language instruction referring to a remote object (here in the red bounding box) in a photo-realistic 3D environment. The agent must navigate to an appropriate location and identify the object from multiple distracting candidates. The blue discs indicate nearby navigable viewpoints provided the simulator.
Note* This section prepares everything to run or train our Navigator-Pointer model. If you are familar with R2R and just want to do the REVERIE task, you can directly go to Section 6.
Note** If you have a fresh Ubuntu system, the following instruction should work well. If not, it may screw up your existing project environments and recommend to try Section 3. Install with Docker.
A C++ compiler with C++11 support is required. Matterport3D Simulator has several dependencies:
- Ubuntu 14.04, 16.04, 18.04
- OpenCV >= 2.4 including 3.x
- OpenGL
- OSMesa
- GLM
- Numpy
- pybind11 for Python bindings
- Doxygen for building documentation
E.g. installing dependencies on Ubuntu:
sudo apt-get install libopencv-dev python-opencv freeglut3 freeglut3-dev libglm-dev libjsoncpp-dev doxygen libosmesa6-dev libosmesa6 libglew-dev
If still lack some packages during runing cmake/make or our codes, you can refer to the content in the Dockerfile.
Clone the REVERIE repository:
git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE
Note that our repository is based on the v0.1 version Matterport3DSimulator, which was originally proposed with the Room-to-Room dataset.
Download our pre-trained mini MAttnet3 from Google Drive or Baidu Yun (code: qts6), which is modified from MAttNet to support our model training. Unzip it into the MAttnet3 folder. This is used as the our Pointer model.
You need to download RGB images and house segmentation files of the Matterport3D dataset. The following data types are required:
matterport_skybox_images
house_segmentations
The metadata is also needed, and organise data like below:
Matterport
|--v1
|--metadata
|--scans
Then update the 'matterportDir' to Matterport setting in trainFast.py.
Download and extract the tsv files into the img_features
directory from Matterport3DSimulator. You will only need the ImageNet features to replicate our results.
Let us get things ready to run experiments.
# change "rog" (remote object grounding) to any name you prefer
conda create -n rog python=3.6
Activate the enviorment you just created
conda activate rog
pip install -r tasks/REVERIE/requirements.txt
# with CUDA 90
conda install pytorch=0.4.0 cuda90 -c pytorch
conda install torchvision=0.2.0 -c pytorch
If you use a newer version, you need to modify codes to load pretrained models.
Let us compile the simulator so that we can call its functions in python.
Build EGL version using CMake:
cd build
cmake -DEGL_RENDERING=ON ..
# Double-check if CMake find the proper path to your python
# if not, remove the make files and use the cmake with option below instead
rm -rf *
cmake -DEGL_RENDERING=ON -DPYTHON_EXECUTABLE:FILEPATH=/path/to/your/bin/python ..
make
cd ../
Note There are three rendering options, which are selected using cmake options during the build process:
- Off-screen GPU rendering using EGL:
cmake -DEGL_RENDERING=ON ..
- Off-screen CPU rendering using OSMesa:
cmake -DOSMESA_RENDERING=ON ..
- GPU rendering using OpenGL (requires an X server):
cmake ..
The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).
cd MAttNet3/pyutils/mask-faster-rcnn/lib
You may need to change the -arch
version in Makefile
to compile the cuda code:
GPU model | Architecture |
---|---|
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
Compile the CUDA-based nms
and roi_pooling
using following simple commands:
make
cd ../../refer
make
It will generate _mask.c
and _mask.so
in external/
folder.
We find that the success rate is slightly lower that obtained using environments built without docker.
- Nvidia GPU with driver >= 384
- Install docker
- Install nvidia-docker2.0
- Note: CUDA / CuDNN toolkits do not need to be installed (these are provided by the docker image)
Clone the REVERIE repository:
git clone https://github.com/YuankaiQi/REVERIE.git
cd REVERIE
First download fiels as Section 2.3. Then set an environment variable to the location of the dataset, where is the full absolute path (not a relative path or symlink) to the directory 'v1':
export MATTERPORT_DATA_DIR=<PATH>
And set the 'matterportDir' parameter to 'data' in the trainFast.py file.
Note that if is a remote sshfs mount, you will need to mount it with the -o allow_root
option or the docker container won't be able to access this directory.
To make data loading faster and to reduce memory usage we preprocess the matterport_skybox_images
by downscaling and combining all cube faces into a single image using the following script:
./scripts/downsize_skybox.py
This will take a while depending on the number of processes used. By default images are downscaled by 50% and 20 processes are used.
Build the docker image:
docker build -t reverie .
Run the docker container, mounting both the git repo and the dataset:
nvidia-docker run -it --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie
Now (from inside the docker container), build the simulator and run the unit tests:
cd /root/mount/Matterport3DSimulator
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
cd ../
Note There are three rendering options, which are selected using cmake options during the build process (by varying line 3 in the build commands immediately above):
- Off-screen GPU rendering using EGL:
cmake -DEGL_RENDERING=ON ..
- Off-screen CPU rendering using OSMesa:
cmake -DOSMESA_RENDERING=ON ..
- GPU rendering using OpenGL (requires an X server):
cmake ..
The recommended (fast) approach for training agents is using off-screen GPU rendering (EGL).
cd MAttNet3/pyutils/mask-faster-rcnn/lib
You may need to change the -arch
version in Makefile
to compile the cuda code:
GPU model | Architecture |
---|---|
TitanX (Maxwell/Pascal) | sm_52 |
GTX 960M | sm_50 |
GTX 1080 (Ti) | sm_61 |
Grid K520 (AWS g2.2xlarge) | sm_30 |
Tesla K80 (AWS p2.xlarge) | sm_37 |
Compile the CUDA-based nms
and roi_pooling
using following simple commands:
make
cd ../../refer
make
It will generate _mask.c
and _mask.so
in external/
folder.
Run the docker container while sharing the host's X server and DISPLAY environment variable with the container:
xhost +
nvidia-docker run -it -e DISPLAY -v /tmp/.X11-unix:/tmp/.X11-unix --mount type=bind,source=$MATTERPORT_DATA_DIR,target=/root/mount/Matterport3DSimulator/data/v1,readonly --volume `pwd`:/root/mount/Matterport3DSimulator reverie
cd /root/mount/Matterport3DSimulator
If you get an error like Error: BadShmSeg (invalid shared segment parameter) 128
you may also need to include -e="QT_X11_NO_MITSHM=1"
in the docker run command above.
- For training You can download our pre-trained models from Google Drive or Baidu Yun. If you want to train by yourself, just run the following command:
python tasks/REVERIE/trainFast.py --feedback_method sample2step --experiment_name releaseCheck
- For testing To test the model, you need first obtain navigation results by
python tasks/REVERIE/run_search.py
Then run the following command to obtain the grounded object
python tasks/REVERIE/groundingAfterNav.py
Now, you should get results in the 'experiment/releaseCheck/results/' folder.
Note that the results might be slightly different due to using different dependant package versions or GPUs.
In the tasks/REVERIE/data folder, you will have REVERIE_train.json, REVERIE_val_seen.json, REVERIE_val_unseen.json, and REVERIE_test four files, which provide instructions, paths, and target object of each task (except the REVERIE_test file). In the tasks/REVERIE/data/BBox folder, you will have json files that record objects observed at each viewpoint within 3 meters.
- Example of tarin/val_seen/val_unseen.json file
[
{
"distance" : 11.65, # distance to the goal viewpoint
"ix": 208, # reserved data, not used
"scan": "qoiz87JEwZ2", # building ID
"heading": 4.59, # initial parameters for agent
"path_id": 1357, # inherited from the R2R dataset
"objId": 66, # the unique object ID in the current building
"id": "1357_66" # task id
"instructions":[ # collected instructions for REVERIE
"Go to the entryway and clean the coffee table",
"Go to the foyer and wipe down the coffee table",
"Go to the foyer on level 1 and pull out the coffee table further from the chair"
]
"path": [ # inherited from the R2R dataset
"bdb1023cb7cc4ebd8245b9291fcbc1a2",
"a6ba3f53b7964464b23341896d3c75fa",
"c407e34577aa4724b7e5d447a5d859d1",
"9f68b19f50d14f5d8371447f73c3a2e3",
"150763c717894adc8ccbbbe640fa67ef",
"59b190857cfe47f691bf0d866f1e5aeb",
"267a7e2459054db7952fc1e3e45e98fa"
]
"instructions_l":[ # inherited from the R2R dataset and provided just for convenience
"Walk into the dining room and continue past the table. Turn left when you xxx ",
...
]
},
...
]
-
Example of json file in the bbox folder
File name format: ScanID_ViewpointID.json, e.g.,VzqfbhrpDEA_57fba128d2f042f7a59793c665a3f587.json
{ # note that this is in the variable type of dict not list
"57fba128d2f042f7a59793c665a3f587":{ # this key is the id of viewpoint
"827":{ # the key if object ID
"name": "toilet",
"visible_pos":[
6,7,8,9,19,20 # view index (0~35) which contain the object. Index is consitent with that in R2R
],
"bbox2d":[
[585,382,55,98], # [x,y,w,h] and corresponds to the views listed in the "visible_pos"
...
]
},
"833": {
...
},
...
}
}
The easiest way to integrate into your project is to preload all the objects bounding_box/label/visible_pos with the loadObjProposals() function as in the eval_release.py file. Then you are able to access visible objects using ScanID_ViewpointID as key. You can use any referring expression methods to get matched objects with an instruction.
Note The number of instructions may vary across the dataset, we recommend the following way to index an instruction:
instrType = "instructions"
self.instr_ids += ['%s_%d' % (str(item['id']),i) for i in range(len(item[instrType]))]
Just add the "'predObjId': int value" pair into your navigation results. That's it!
Below is a toy sample:
[
{
"trajectory": [
[
"a68b5ae6571e4a66a4727573b88227e4",
3.141592653589793,
0.0
],
...
],
"instr_id": "4774_267_1",
"predObjId": 402
},
...
]
We would like to thank Matterport for allowing the Matterport3D dataset to be used by the academic community. We also thank Philip Roberts, Zheng Liu, Zizheng Pan, and Sam Bahrami for their great help in building the dataset. This project is supported by the Australian Centre for Robotic Vision.
The REVERIE task and dataset are descriped in the following paper:
@inproceedings{reverie,
title={REVERIE: Remote Embodied Visual Referring Expression in Real Indoor Environments},
author={Yuankai Qi and Qi Wu and Peter Anderson and Xin Wang and William Yang Wang and Chunhua Shen and Anton van den Hengel},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2020}
}