Visual Queries 3D localization

Installation instructions

The code has been tested with Ubuntu > 16.04 and Python 3.7

Clone the repositiory from here.

git clone git@github.com:EGO4D/episodic-memory.git
cd episodic-memory/VQ3D

Load the git submodules

git submodule init
git submodule update

Create a conda environment.
```
conda create -n ego4d_vq3d python=3.7
```
Install the requirements using pip:
```
pip install -r requirements.txt
```
Install COLMAP following these intructions. Don't forget to add the path to colmap into your PATH environment variable.

Data

Download the data using the client. Place the data into the ./data folder.

python -m ego4d.cli.cli --output_directory ~/episodic-memory/VQ3D/data/ --dataset full_scale --universities unict
python -m ego4d.cli.cli --output_directory ~/episodic-memory/VQ3D/data/ --dataset annotations
python -m ego4d.cli.cli --output_directory ~/episodic-memory/VQ3D/data/ --dataset 3d

[UPDATE] construct the 5fps clips using the VQ2D tools. (clips downloaded directly from the client are at 30fps. In the VQ tasks we use 5fps.)
```
python convert_videos_to_clips.py
```
Generate the <split>_annot.json following the instruction under the VQ2D folder - step 2 from the “Running experiments” section. Place the files under ./data/

Workflow

Camera pose estimation

cd camera_pose_estimation/

Compute the camera intrinsics.

Extract all video frames.

python extract_frames_all_clips.py \
    --input_dir data/v1/full_scale/ \
    --output_dir data/v1/videos_frames/ \
    --clips_json data/v1/3d/all_videos_for_vq3d_v1.json \
    --j 8 # number of parallel processes. Increase for spead.

Pre-select frames for COLMAP

python extract_frames_for_colmap.py \
    --input_dir data/v1/videos_frames/ \
    --output_dir data/v1/videos_frames_for_sfm/ \
    --j 8 # number of parallel processes. Increase for spead.

Run COLMAP on the pre-selected frames

python get_intrinsics_for_all_clips.py \
    --input_dir data/v1/videos_frames_for_sfm/ \
    --output_dir data/v1/videos_sfm/

For videos where sfm fails we run a greedy version of the reconstruction where we select 100 frames in the middle of the video.

python get_intrinsics_for_all_clips_greedy.py \
    --input_dir data/v1/videos_frames/ \
    --sfm_input_dir data/v1/videos_frames_for_sfm_greedy/ \
    --output_dir data/v1/videos_sfm_greedy/

Get intrinsics for each clips.

python get_median_intrinsics.py \
    --input_dir data/v1/videos_sfm/ \
    --input_dir_greedy data/v1/videos_sfm_greedy/ \
    --annotation_dir_greedy data/v1/annotations/ \
    --output_filename data/v1/scan_to_intrinsics.json

Note: To help reproducibility camera intrinsics have been added here: data/scan_to_intrinsics.json

Compute the camera poses

Extract all clips frames.

python extract_frames_all_clips.py \
    --input_dir data/v1/clips/ \
    --output_dir data/v1/clips_frames/ \
    --clips_json data/v1/3d/all_clips_for_vq3d_v1.json \
    --j 8 # number of parallel processes. Increase for spead.

Get the camera poses for all frames of all clips

python run_all_data.py
    --input_dir data/v1/clips_frames/ \
    --query_filename data/v1/annotations/vq3d_val.json \
    --camera_intrinsics_filename data/v1/scan_to_intrinsics.json \
    --scans_keypoints_dir data/v1/3d/scans_keypoints/ \
    --scans_dir data/v1/3d/scans/ \
    --output_dir data/v1/clips_camera_poses/ \

Note-1: Camera pose estimation results on the val set have been included for reference here: data/all_clips_camera_poses_val.json
Note-2: To help reproducibility we are also providing the data for all the intermediate steps to compute the poses for one clip in the val set. You can download that information here

Depth estimation

cd depth_estimation/

Prepare RGB inputs

python prepare_inputs_for_depth_estimation.py \
    --input_dir data/v1/clips_camera_poses/ \
    --vq2d_results data/vq2d_results/siam_rcnn_residual_kys_val.json \
    --vq2d_annot data/v1/annotations/val_annot.json \
    --vq3d_queries data/v1/annotations/vq3d_val.json \
    --vq2d_queries data/v1/annotations/vq_val.json

Compute Depths

python compute_depth_for_ego4d.py \
    --input_dir data/v1/clips_camera_poses/

VQ3D

cd VQ3D/

Compute Ground-Truth vector in query frame coordinate system for queries with pose estimated.

python scripts/prepare_ground_truth_for_queries.py \
    --input_dir data/v1/clips_camera_poses/ \
    --vq3d_queries data/v1/annotations/vq3d_val.json \
    --output_fileaname data/v1/annotations/vq3d_val_wgt.json \
    --vq2d_queries data/v1/vq_val.json

Compute 3D vector predictions

python scripts/run.py \
    --input_dir data/v1/clips_camera_poses/ \
    --output_fileaname data/vq3d_results/siam_rcnn_residual_kys_val.json \
    --vq2d_results data/vq2d_results/siam_rcnn_residual_kys_val.json \
    --vq2d_annot data/v1/annotations/val_annot.json \
    --vq2d_queries data/v1/annotations/vq_val.json \
    --vq3d_queries data/v1/annotations/vq3d_val_wgt.json

Run evaluation

python scripts/eval.py \
    --vq3d_results data/vq3d_results/siam_rcnn_residual_kys_val.json

Note: To help reproducibility we provide the results for the val set here: data/vq3d_results/siam_rcnn_residual_kys_val.json

[UPDATES] Notes about the Challenge

Please use the vq3d_test_unannotated_template.json file (under data/) for the challenge. These queries are the same as the ones downloaded using the ego4d-client, we have added the 'annotation_uid' entry to help match VQ3D and VQ2D queries.
To find the corresponding VQ2D queries you should use the 'annotation_uid' entry found in each VQ3D query. We have also added a mapping_vq2d_to_vq3d_queries_annotations_test.json to help find the corresponding queries. Please refer to these lines to understand how to use it.
To create a submission for the challenge you should use the vq3d_test_unannotated_template.json file and add the required information directly in it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Visual Queries 3D localization

Installation instructions

Data

Workflow

Camera pose estimation

Depth estimation

VQ3D

[UPDATES] Notes about the Challenge

Files

README.md

Latest commit

History

README.md

File metadata and controls

Visual Queries 3D localization

Installation instructions

Data

Workflow

Camera pose estimation

Depth estimation

VQ3D

[UPDATES] Notes about the Challenge