This is the official implementation of "ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis".
@misc{spiegl2024viewfusion,
title={ViewFusion: Learning Composable Diffusion Models for Novel View Synthesis},
author={Bernard Spiegl and Andrea Perin and Stéphane Deny and Alexander Ilin},
year={2024},
eprint={2402.02906},
archivePrefix={arXiv},
primaryClass={cs.CV}
}
You can install and activate the conda environment by simply running:
conda env create -f environment.yml
conda activate view-fusion
For ARM-based macOS run:
conda env create -f environment_osx.yml
conda activate view-fusion
Version of the NMR ShapeNet dataset we use is hosted by (Niemeyer et al.). Downloadable here.
Please note that our current setup is optimized for use in a cluster computing environment and requires sharding.
To shard the dataset, place the NMR_Dataset.zip
in data/nmr/
and run python data/dataset_prep.py
command. The default sharding will split the dataset into four shards. In order to enable parallelization, the number of shards has to be divisible by the number of GPUs you use.
Configurations for various experiments are located in configs/
.
To launch training on a single GPU run:
python main.py -c configs/small-v100.yaml -g -t --wandb
For a distributed setup run:
torchrun --nnodes=$NUM_NODES --nproc_per_node=$NUM_GPUS main.py -c configs/small-v100-4.yaml -g -t --wandb
where $NUM_NODES
and $NUM_GPUS
can, for instance, be replaced by 1 and 4, respectively. This would correspond to a single-node setup with four V100 GPUs.
(In case you are using Slurm, more example scripts are available in slurm/
.)
Coming soon.
Coming soon.
In case you want to implement separate data pipelines or training procedures, all the architecture details are available in model/
.
At training time, the model receives:
y_0
which is the target (ground truth) of shape(B C H W)
,y_cond
which contains all the input views and is of shape(B N C H W)
where N denotes the total number of views (24 in our case),view_count
of shape(B,)
which contains the number of views used as conditioning for each sample in the batch,angle
also of shape(B,)
indicating the target angle for each sample.
At inference time, y_0
is omitted, with everything else remaining the same as training.
See paper for full implementation details.
NB Training configurations require significant amount of VRAM.
The model referenced in the paper was trained using configs/multi-view-composable-variable-small-v100-4.yaml
configuration for 710k steps (approx. 6.5 days) on 4x V100 GPUs, each with 32GB VRAM.
Pretrained model weights will be made available soon.