Skip to content

[CVPR 2025] StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Notifications You must be signed in to change notification settings

zju3dv/street_crafter

Repository files navigation

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models
Yunzhi Yan*, Zhen Xu*, Haotong Lin, Haian Jin, Haoyu Guo, Yida Wang, Kun Zhan, Xianpeng Lang, Hujun Bao, Xiaowei Zhou, Sida Peng
CVPR 2025

street_crafter.mp4

Installation

Clone this repository

git clone https://github.com/zju3dv/street_crafter.git --recursive

Set up the environment

Our model is tested on one A100/A800 80GB GPU.

conda create -n streetcrafter python=3.9
conda activate streetcrafter

# Install dependencies.
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu121

# Install requirements
pip install -r requirements.txt 

# Install gsplat
pip install "git+https://github.com/dendenxu/gsplat.git" 
# This issue might help when installation fails: https://github.com/nerfstudio-project/gsplat/issues/226

# Install submodules
pip install ./submodules/sdata
pip install ./submodules/simple-knn

Data Processing

Please go to data_processor and refer to README.md for processing details. We provide some example scenes on this link. You can skip the processing steps and download the data to data/waymo directory.

Model Weights

The pretrained model weights can be downloaded from this link to video_diffusion/ckpts directory. We also provide the model weights trained using multi-cameras of Waymo under this link.

Inference

Inference video diffusion model

python render.py --config {config_path} mode diffusion

We also provide another option for inference by setting the meta info file path.

# run the command under video diffusion directory
python sample_condition.py  

Distillation

We distill the video diffusion model into dynamic 3D representation based on the codebase of Street Gaussians. Please refer to street_gaussian/config/config.py for details of parameters.

Train street gaussian

python train.py --config {config_path} 

Render input trajectory

python render.py --config {config_path} mode trajectory

Render novel trajectory

python render.py --config {config_path} mode novel_view

Training

First download the model weights of Vista from this link to video_diffusion/ckpts directory. We finetune the video diffuson model based on the codebase of Vista. Please refer to their official Documents for environment setup and training details.

# run the command under video diffusion directory
sh training.sh

Overview

pipeline (a) We process the LiDAR using calibrated images and object tracklets to obtain a colorized point cloud, which can be rendered to image space as pixel-level conditions. (b) Given observed images and reference image embedding $\mathbf{c}_\text{ref}$, we optimize the video diffusion model conditioned on the LiDAR renderings to perform controllable video generation. (c) Starting from the rendered images and LiDAR conditions under novel trajectory, we use the pretrained controllable video diffusion model to guide the optimization of the dynamic 3DGS representation by generating novel views as extra supervision signals.

Citation

If you find this code useful for your research, please use the following BibTeX entry.

@inproceedings{yan2024streetcrafter,
  title={StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models},
  author={Yan, Yunzhi and Xu, Zhen and Lin, Haotong and Jin, Haian and Guo, Haoyu and Wang, Yida and Zhan, Kun and Lang, Xianpeng and Bao, Hujun and Zhou, Xiaowei and Peng, Sida},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2025},
}

About

[CVPR 2025] StreetCrafter: Street View Synthesis with Controllable Video Diffusion Models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published