3D Image Diffusion

This repo is a fork of Phil Wang's Video Diffusion - Pytorch.

The original code was modified for 3D Image generation.

Important Notes

Default number of channels is 1.
Time conditioning is off.
Text conditioning is probably broken.
num_frames now is a depth dimension: B, C, D, H, W. Since model assumes that H=W I think D has to be D=H=W, but this has to be tested.

Install

$ git clone https://github.com/ida-mdc/diff3d.git
$ cd diff3d
$ pip install .

Usage

Generate Image

import torch
from diff3d import Unet3D, GaussianDiffusion

model = Unet3D(
    dim=64,
    dim_mults=(1, 2, 4, 8)
)

diffusion = GaussianDiffusion(
    model,
    image_size=32,
    num_frames=32,  # The len of depth dimension (ideally same as image_size)
    timesteps=1000,  # number of steps
    loss_type="l1",  # L1 or L2
)

image = torch.randn(1, 1, 32, 32, 32) # 3d image (batch, channels, depth, height, width) - normalized from -1 to +1
loss = diffusion(image)
loss.backward()
# after a lot of training

sampled_image = diffusion.sample(batch_size=4)
sampled_image.shape # (4, 1, 32, 32, 32)

Image Inpainting

import torch
from diff3d import Unet3D, GaussianDiffusion

model = Unet3D(
    dim=64,
    dim_mults=(1, 2, 4, 8)
)

diffusion = GaussianDiffusion(
    model,
    image_size=32,
    num_frames=32,  # The len of depth dimension (ideally same as image_size)
    timesteps=1000,  # number of steps
    loss_type="l1",  # L1 or L2
)

# New `inpaint` method takes a batch of images and a batch of corresponding masks.
# Returns reconstructions of the same shape as images and masks. 
# By default, writes intermediate inpaintings to `./inpainting` directory every 100 timesteps. Set None or False to disable.
images = torch.randn(4, 1, 32, 32, 32)  # (batch, channels, depth, height, width), normalized from 0 to 1 (e.g. after pil_to_tensor transform).
masks = ...  # binary masks (batch, channels, depth, height, width), where 1 - indicates inpainting regions.

inpainting = diffusion.inpaint(
    image=images,
    mask=masks,
    save_every=100,
)
inpainting.shape  # (4, 1, 32, 32, 32)

Training

This repository also contains a handy Trainer class for training on a folder of gifs. Each gif must be of the correct dimensions image_size and num_frames.

import torch
from diff3d import Unet3D, GaussianDiffusion, Trainer

model = Unet3D(
    dim=64,
    dim_mults=(1, 2, 4, 8),
)

diffusion = GaussianDiffusion(
    model,
    image_size=64,
    num_frames=64,  # The len of depth dimension (ideally same as image_size)
    timesteps=1000,  # number of steps
    loss_type="l1"  # L1 or L2
).cuda()

trainer = Trainer(
    diffusion,
    "./data",  # this folder path needs to contain all your training data, as .gif files, of correct image size
    train_batch_size=32,
    train_lr=1e-4,
    save_and_sample_every=1000,
    train_num_steps=700000,  # total training steps
    gradient_accumulate_every=2,  # gradient accumulation steps
    ema_decay=0.995,  # exponential moving average decay
    amp=True,  # turn on mixed precision
)

trainer.train()

Sample images (as gif files) will be saved to ./results periodically, as are the diffusion model parameters.

Citations

@software{,
  title = {Video Diffusion - Pytorch},
  author = {Phil Wang},
  year = {2023}
  license = {MIT},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/lucidrains/video-diffusion-pytorch}},
  commit = {f68f31eaa94c2b9987571136e6bb8c4f52960eef}
}

@misc{ho2022video,
  title   = {Video Diffusion Models}, 
  author  = {Jonathan Ho and Tim Salimans and Alexey Gritsenko and William Chan and Mohammad Norouzi and David J. Fleet},
  year    = {2022},
  eprint  = {2204.03458},
  archivePrefix = {arXiv},
  primaryClass = {cs.CV}
}

@misc{Saharia2022,
    title   = {Imagen: unprecedented photorealism × deep level of language understanding},
    author  = {Chitwan Saharia*, William Chan*, Saurabh Saxena†, Lala Li†, Jay Whang†, Emily Denton, Seyed Kamyar Seyed Ghasemipour, Burcu Karagol Ayan, S. Sara Mahdavi, Rapha Gontijo Lopes, Tim Salimans, Jonathan Ho†, David Fleet†, Mohammad Norouzi*},
    year    = {2022}
}

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
diff3d		diff3d
.gitignore		.gitignore
LICENSE		LICENSE
LICENSE.video-diffusion-pytorch		LICENSE.video-diffusion-pytorch
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

3D Image Diffusion

Important Notes

Install

Usage

Generate Image

Image Inpainting

Training

Citations

About

Licenses found

Contributors 2

Languages

License

Licenses found

ida-mdc/diff3d

Folders and files

Latest commit

History

Repository files navigation

3D Image Diffusion

Important Notes

Install

Usage

Generate Image

Image Inpainting

Training

Citations

About

Resources

License

Licenses found

Stars

Watchers

Forks

Contributors 2

Languages