This example shows how to run the fine-tuned pi0_fast_droid
with pi0_droid
in the commands below. In practice, we find that out-of-the-box, the
Since the DROID control laptop does not have a powerful GPU, we will start a remote policy server on a different machine with a more powerful GPU and then query it from the DROID control laptop during inference.
- On a machine with a powerful GPU (~NVIDIA 4090), clone and install the
openpi
repository following the instructions in the README. - Start the OpenPI server via the following command:
uv run scripts/serve_policy.py policy:checkpoint --policy.config=pi0_fast_droid --policy.dir=s3://openpi-assets/checkpoints/pi0_fast_droid
You can also run the equivalent command below:
uv run scripts/serve_policy.py --env=DROID
- Make sure you have the most recent version of the DROID package installed on both the DROID control laptop and the NUC.
- On the control laptop, activate your DROID conda environment.
- Clone the openpi repo and install the openpi client, which we will use to connect to the policy server (this has very few dependencies and should be very fast to install): with the DROID conda environment activated, run
cd $OPENPI_ROOT/packages/openpi-client && pip install -e .
. - Install
tyro
, which we will use for command line parsing:pip install tyro
. - Copy the
main.py
file from this directory to the$DROID_ROOT/scripts
directory. - Replace the camera IDs in the
main.py
file with the IDs of your cameras (you can find the camera IDs by runningZED_Explore
in the command line, which will open a tool that shows you all connected cameras and their IDs -- you can also use it to make sure that the cameras are well-positioned to see the scene you want the robot to interact with). - Run the
main.py
file. Make sure to point the IP and host address to the policy server. (To make sure the server machine is reachable from the DROID laptop, you can runping <server_ip>
from the DROID laptop.) Also make sure to specify the external camera to use for the policy (we only input one external camera), choose from ["left", "right"].
python3 scripts/main.py --remote_host=<server_ip> --remote_port=<server_port> --external_camera="left"
The script will ask you to enter a free-form language instruction for the robot to follow. Make sure to point the cameras at the scene you want the robot to interact with. You do not need to carefully control camera angle, object positions, etc. The policy is fairly robust in our experience. Happy prompting!
Issue | Solution |
---|---|
Cannot reach policy server | Make sure the server is running and the IP and port are correct. You can check that the server machine is reachable by running ping <server_ip> from the DROID laptop. |
Cannot find cameras | Make sure the camera IDs are correct and that the cameras are connected to the DROID laptop. Sometimes replugging the cameras can help. You can check all connected cameras by running ZED_Explore in the command line. |
Policy inference is slow / inconsistent | Try using a wired internet connection for the DROID laptop to reduce latency (0.5 - 1 sec latency per chunk is normal). |
Policy does not perform the task well | In our experiments, the policy could perform simple table top manipulation tasks (pick-and-place) across a wide range of environments, camera positions, and lighting conditions. If the policy does not perform the task well, you can try modifying the scene or object placement to make the task easier. Also make sure that the camera view you are passing to the policy can see all relevant objects in the scene (the policy is only conditioned on a single external camera + wrist camera, make sure you are feeding the desired camera to the policy). Use ZED_Explore to check that the camera view you are passing to the policy can see all relevant objects in the scene. Finally, the policy is far from perfect and will fail on more complex manipulation tasks, but it usually makes a decent effort. :) |