Clone this repo with git clone --recursive https://github.com/kerrj/rsrd
, which will clone submodules into dependencies/
First please install PyTorch 2.1.2 in a python 3.10 conda env with cuda version 12.0 (should also work with different torch versions but this is what we've tested). Next, install nerfstudio and gsplat using the instructions provided in their documentation. We use nerfstudio
version 1.1.4 and gsplat
version 1.4.0.
Once these are installed, install GARField (included inside dependencies/
), which should simply be pip installable except for cuML, which can be pip installed with
pip install --extra-index-url=https://pypi.nvidia.com cudf-cu12==24.10.* cuml-cu12==24.10.*
Finally, for robot trajectory optimization, install JAX, with GPU if possible; trajectory remapping optimization heavily relies on batch IK + trajectory solves. Please see jaxmp
for installation details.
The system was tested with a RTX 4090, with jax==0.4.35
, but should work for other versions as well.
Simply pip install -e .
inside your cloned rsrd
repo to install.
Please make sure you pip
install RSRD before submodules, as it installs some dependencies. There are a number of submodules inside the dependencies/
folder, which can all be pip installed via
cd dependencies/
pip install -e dig -e garfield -e raftstereo -e hamer_helper -e jaxls -e jaxmp
If you would like to run the full pipeline with robot motion planning or visualize hand detections, you need to install HaMeR. Please follow the instructions there for how to do this! (it involves downloading model weights).
To catch most install issues, after installation you should be able to run ns-train garfield -h
and ns-train dig -h
and see a formatted help output of the two training pipelines.
We have published data to reproduce the paper results on our site. They consist of a multi-view scan in nerfstudio format, and a .mp4
video of the input demonstration.
To capture your own objects, please first scan an object of interest as described here.
4D-DPM consists of two overlaid models: a GARField and a dino-embedded gaussian model (which we call DiG)
- Train GARField with
ns-train garfield --data <path/to/datasetname>
. This will produce an output config file insideoutput/datasetname/garfield/<timestamp>/config.yml
- Train DiG with
ns-train dig --data <path/to/data/directory> --pipeline.garfield-ckpt <path/to/config.yml>
, using the output config file from the GARField training. - Segment the model: inside the viewer for DiG, you should see the following GUI:
After the 4D-DPM is completed, the script scripts/run_tracker.py
executes the motion reconstruction from a video. To see a full list of options run python scripts/run_tracker.py -h
:
╭─ options ─────────────────────────────────────────────────────────────╮
│ -h, --help show this help message and exit │
│ --is-obj-jointed {True,False} │
│ (required) │
│ --dig-config-path PATH (required) │
│ --video-path PATH (required) │
│ --output-dir PATH (required) │
│ --save-hand, --no-save-hand │
│ (default: True) │
╰───────────────────────────────────────────────────────────────────────╯
--is-obj-jointed
: False for objects that contain removable parts or prismatic joints. Setting False makes the ARAP loss weaker--dig-config-path
: the path to the dig config.yml file. (NOT GARField), which usually looks likeoutputs/datasetname/dig/timestamp/config.yml
--video-path
: the path to the demonstration .mp4 video--output-dir
: location to output track trajectory--save-hand
: if specified will compute hand poses with HaMer, otherwise if--no-save-hand
will not
After tracking executes you can visualize the 4D reconstruction in a viser window (usually localhost:8080
), which should look like this!
In addition, the output folder you specified will contain:
camopt_render.mp4
: a file showing an animation of the object pose initialization (including all random seeds)frame_opt.mp4
: an mp4 file showing the rendered object trajectory from the camera perspective, overlaid on top of the video like so:
example_output.mp4
keyframes.txt
: a loadable representation of the tracked part poses
For object-centric robot trajectory generation, you need to specify:
- how many hands to use (
single
orbimanual
), and - the initial object position, from which the robot will plan the part motion trajectory.
There are two options to initialize the object position:
- Interactively set the object into an initial position, or
- (Requires ZED SDK) Use an example ZED observation from the robot POV at
data/robot_init_obs
.
usage: run_planner.py [-h] --hand-mode {single,bimanual} --track-dir PATH [--zed-video-path {None}|PATH]
╭─ options ───────────────────────────────────────────────╮
│ -h, --help show this help message and exit │
│ --hand-mode {single,bimanual} │
│ (required) │
│ --track-dir PATH (required) │
│ --zed-video-path {None}|PATH │
│ (default: None) │
╰─────────────────────────────────────────────────────────╯
Set --track-dir
to --output-dir
as specified from run_tracker.py
.
If you don't provide --zed-video-path
(option 1), you need to drag the object into a desired location, fix the object location (unclick "Move object") and click "Generate Trajectory". Otherwise, the script will start trajectory generation immediately after the object is registered into the workspace.
Screen.Recording.2024-10-26.at.11.25.23.PM.mov
Example output:
Screen.Recording.2024-10-26.at.11.27.33.PM.mov
For option 1, you can now un-click "Move Object" to re-generate trajectories from a new object pose, or click "Generate Trajectory" again for more.
If you find this useful, please cite the paper!
@inproceedings{kerr2024rsrd, title={Robot See Robot Do: Imitating Articulated Object Manipulation with Monocular 4D Reconstruction}, author={Justin Kerr and Chung Min Kim and Mingxuan Wu and Brent Yi and Qianqian Wang and Ken Goldberg and Angjoo Kanazawa}, booktitle={8th Annual Conference on Robot Learning}, year = {2024}, }