Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Support disco-diffusion text-2-image #1234

Merged
merged 67 commits into from
Dec 2, 2022
Merged
Show file tree
Hide file tree
Changes from 62 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
0a0616d
going through adm unconditional sampling
plyfager Sep 14, 2022
1a88411
add config and code for test
plyfager Sep 16, 2022
261cc60
resolve conflict
plyfager Sep 16, 2022
601391a
fix lint
plyfager Sep 16, 2022
4613727
modify unet
plyfager Sep 19, 2022
b35e77c
format adm
plyfager Sep 20, 2022
a4690a8
support adm 512
plyfager Sep 22, 2022
2554a57
fix lint
plyfager Sep 26, 2022
1955304
support cls-g sampling
plyfager Sep 26, 2022
b85c93d
support ddim sampling
plyfager Sep 26, 2022
48a1382
init disco
plyfager Sep 27, 2022
08cd338
support disco-diffusion text2image
plyfager Oct 26, 2022
6a5c2b3
support secondary model in disco-diffusion (#1368)
yanniangu Oct 27, 2022
6073e19
support init_image as input
plyfager Nov 11, 2022
cb31610
init docstring
plyfager Nov 15, 2022
8da2eb2
solve conflict
plyfager Nov 16, 2022
9226e6d
Merge branch 'dev-1.x' of https://github.com/open-mmlab/mmediting int…
plyfager Nov 17, 2022
7d4d99a
refactor disco
plyfager Nov 21, 2022
1dab8ab
fix lint
plyfager Nov 22, 2022
6432fd2
resolve conflict
plyfager Nov 22, 2022
30911e3
Merge branch 'plyfager/disco-diffusion' of github.com:open-mmlab/mmed…
plyfager Nov 22, 2022
65e66fc
fix lint
plyfager Nov 22, 2022
3a47b31
remove disco bug
plyfager Nov 22, 2022
7de125e
remove data_preprocessor
plyfager Nov 23, 2022
8438b29
complete docstring of disco and partial guider
plyfager Nov 23, 2022
ee06f02
complete docstring for guider
plyfager Nov 23, 2022
e487cce
refine secondary model
plyfager Nov 23, 2022
8d8ef6f
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 23, 2022
f555e88
fix lint
plyfager Nov 23, 2022
7109bbc
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 23, 2022
33f5923
move cutter and loss config to infer
plyfager Nov 24, 2022
47cb257
fix adm and unet
plyfager Nov 25, 2022
a3e2ed8
rename config
plyfager Nov 25, 2022
3b7ae3e
support portrait generator config
plyfager Nov 28, 2022
8a93dd0
fix clip wrapper
plyfager Nov 28, 2022
ab24126
move unet to DDPM
plyfager Nov 28, 2022
c71ac35
rename clip_ext
plyfager Nov 28, 2022
67abcad
adjust requirements
plyfager Nov 28, 2022
b62d317
try_import
plyfager Nov 28, 2022
472bbd7
add dist.get_rank() == 0 as additional condition
plyfager Nov 28, 2022
515b121
add resize_right to requirements
plyfager Nov 28, 2022
4cf2fba
remove disco_baseline
plyfager Nov 28, 2022
44160ed
update url
plyfager Nov 29, 2022
ac935bd
fix a disco typo
plyfager Nov 29, 2022
d420b8c
add imagenet 256 config
plyfager Nov 29, 2022
eb45aae
Make Disco's readme simple
plyfager Nov 29, 2022
d9f0661
rename disco to disco_diffusion
plyfager Nov 29, 2022
62c49ac
add adm readme
plyfager Nov 30, 2022
b2ef637
fix lint
plyfager Nov 30, 2022
61b04af
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Nov 30, 2022
881b17a
support directly init disco with instance module
plyfager Nov 30, 2022
0d853fc
add ut of disco
plyfager Nov 30, 2022
ffee5e9
fix init
plyfager Nov 30, 2022
957a193
fix lint
plyfager Nov 30, 2022
07890b3
improve docstring coverage
plyfager Nov 30, 2022
678d579
fix lint
plyfager Nov 30, 2022
d83a3ba
fix docstring
plyfager Dec 2, 2022
ee74f0f
fix lint
plyfager Dec 2, 2022
3410381
add credits
plyfager Dec 2, 2022
e9d8c4f
Merge branch 'dev-1.x' of github.com:open-mmlab/mmediting into plyfag…
plyfager Dec 2, 2022
432da88
mv losses
plyfager Dec 2, 2022
d39386b
fix lint
plyfager Dec 2, 2022
224e184
rename diffuser
plyfager Dec 2, 2022
de08313
fix lint
plyfager Dec 2, 2022
707148a
delete null
plyfager Dec 2, 2022
6b6c087
rm raise error
plyfager Dec 2, 2022
369ee89
fix comment
plyfager Dec 2, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
45 changes: 45 additions & 0 deletions configs/_base_/datasets/imagenet_512.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# dataset settings
dataset_type = 'ImageNet'

# different from mmcls, we adopt the setting used in BigGAN.
# We use `RandomCropLongEdge` in training and `CenterCropLongEdge` in testing.
train_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='RandomCropLongEdge', keys=['img']),
dict(type='Resize', scale=(512, 512), keys=['img'], backend='pillow'),
dict(type='Flip', flip_ratio=0.5, direction='horizontal'),
dict(type='PackEditInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='CenterCropLongEdge', keys=['img']),
dict(type='Resize', scale=(512, 512), backend='pillow'),
dict(type='PackEditInputs')
]

train_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True),
persistent_workers=True)

val_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
persistent_workers=True)

test_dataloader = val_dataloader
45 changes: 45 additions & 0 deletions configs/_base_/datasets/imagenet_64.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# dataset settings
dataset_type = 'ImageNet'

# different from mmcls, we adopt the setting used in BigGAN.
# We use `RandomCropLongEdge` in training and `CenterCropLongEdge` in testing.
train_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='RandomCropLongEdge', keys=['img']),
dict(type='Resize', scale=(64, 64), keys=['img'], backend='pillow'),
dict(type='Flip', flip_ratio=0.5, direction='horizontal'),
dict(type='PackEditInputs')
]

test_pipeline = [
dict(type='LoadImageFromFile', key='img'),
dict(type='CenterCropLongEdge', keys=['img']),
dict(type='Resize', scale=(64, 64), backend='pillow'),
dict(type='PackEditInputs')
]

train_dataloader = dict(
batch_size=None,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=train_pipeline),
sampler=dict(type='DefaultSampler', shuffle=True),
persistent_workers=True)

val_dataloader = dict(
batch_size=64,
num_workers=5,
dataset=dict(
type=dataset_type,
data_root='./data/imagenet/',
ann_file='meta/train.txt',
data_prefix='train',
pipeline=test_pipeline),
sampler=dict(type='DefaultSampler', shuffle=False),
persistent_workers=True)

test_dataloader = val_dataloader
135 changes: 135 additions & 0 deletions configs/disco_diffusion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
# Disco Diffusion

> [Disco Diffusion](https://github.com/alembics/disco-diffusion)

> **Task**: Text2Image, Image2Image

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Disco Diffusion (DD) is a Google Colab Notebook which leverages an AI Image generating technique called CLIP-Guided Diffusion to allow you to create compelling and beautiful images from text inputs.

Created by Somnai, augmented by Gandamu, and building on the work of RiversHaveWings, nshepperd, and many others.

<!-- [IMAGE] -->

<div align=center >
<img src="https://user-images.githubusercontent.com/22982797/204526957-ac30547e-5a44-417a-aaa2-6b357b4a139c.png" width="400"/>
</div >

## Results and models

We have converted several `unet` weights and offer related configs. Or usage of different `unet`, please refer to tutorial.

| Diffusion Model | Config | Weights |
| ---------------------------------------- | --------------------------------------------------------------------------- | ------------------------------------------------------------------------------------- |
| 512x512_diffusion_uncond_finetune_008100 | [config](configs/disco/disco-diffusion_adm-u-finetuned_imagenet-512x512.py) | [weights](https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u_finetuned_imagenet-512x512-ab471d70.pth) |
| 256x256_diffusion_uncond | [config](configs/disco/disco-diffusion_adm-u-finetuned_imagenet-256x256.py) | [weights](<>) |
| portrait_generator_v001 | [config](configs/disco/disco-diffusion_portrait_generator_v001.py) | [weights](https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u-cvt-rgb_portrait-v001-f4a3f3bc.pth) |
| pixelartdiffusion_expanded | Coming soon! | |
| pixel_art_diffusion_hard_256 | Coming soon! | |
| pixel_art_diffusion_soft_256 | Coming soon! | |
| pixelartdiffusion4k | Coming soon! | |
| watercolordiffusion_2 | Coming soon! | |
| watercolordiffusion | Coming soon! | |
| PulpSciFiDiffusion | Coming soon! | |

## To-do List

- [ ] pixelart, watercolor, sci-fiction diffusion models
- [ ] image prompt
- [ ] video generation
- [ ] faster sampler(plms, dpm-solver etc.)

We really welcome community users supporting these items and any other interesting staffs!

## Quick Start

Running the following codes, you can get a text-generated image.

```python
from mmengine import Config, MODELS
from mmedit.utils import register_all_modules
from torchvision.utils import save_image

register_all_modules()

disco = MODELS.build(
Config.fromfile('configs/disco/disco-baseline.py').model).cuda().eval()
text_prompts = {
0: [
"A beautiful painting of a singular lighthouse, shining its light across a tumultuous sea of blood by greg rutkowski and thomas kinkade, Trending on artstation.",
"yellow color scheme"
]
}
image = disco.infer(
height=768,
width=1280,
text_prompts=text_prompts,
show_progress=True,
num_inference_steps=250,
eta=0.8)['samples']
save_image(image, "image.png")

```

## Tutorials

Coming soon!

## Credits

Since our adaptation of disco-diffusion are heavily influenced by disco [colab](https://colab.research.google.com/github/alembics/disco-diffusion/blob/main/Disco_Diffusion.ipynb#scrollTo=License), here we copy the credits below.

<details>
Original notebook by Katherine Crowson (https://github.com/crowsonkb, https://twitter.com/RiversHaveWings). It uses either OpenAI's 256x256 unconditional ImageNet or Katherine Crowson's fine-tuned 512x512 diffusion model (https://github.com/openai/guided-diffusion), together with CLIP (https://github.com/openai/CLIP) to connect text prompts with images.

Modified by Daniel Russell (https://github.com/russelldc, https://twitter.com/danielrussruss) to include (hopefully) optimal params for quick generations in 15-100 timesteps rather than 1000, as well as more robust augmentations.

Further improvements from Dango233 and nshepperd helped improve the quality of diffusion in general, and especially so for shorter runs like this notebook aims to achieve.

Vark added code to load in multiple Clip models at once, which all prompts are evaluated against, which may greatly improve accuracy.

The latest zoom, pan, rotation, and keyframes features were taken from Chigozie Nri's VQGAN Zoom Notebook (https://github.com/chigozienri, https://twitter.com/chigozienri)

Advanced DangoCutn Cutout method is also from Dango223.

\--

Disco:

Somnai (https://twitter.com/Somnai_dreams) added Diffusion Animation techniques, QoL improvements and various implementations of tech and techniques, mostly listed in the changelog below.

3D animation implementation added by Adam Letts (https://twitter.com/gandamu_ml) in collaboration with Somnai. Creation of disco.py and ongoing maintenance.

Turbo feature by Chris Allen (https://twitter.com/zippy731)

Improvements to ability to run on local systems, Windows support, and dependency installation by HostsServer (https://twitter.com/HostsServer)

VR Mode by Tom Mason (https://twitter.com/nin_artificial)

Horizontal and Vertical symmetry functionality by nshepperd. Symmetry transformation_steps by huemin (https://twitter.com/huemin_art). Symmetry integration into Disco Diffusion by Dmitrii Tochilkin (https://twitter.com/cut_pow).

Warp and custom model support by Alex Spirin (https://twitter.com/devdef).

Pixel Art Diffusion, Watercolor Diffusion, and Pulp SciFi Diffusion models from KaliYuga (https://twitter.com/KaliYuga_ai). Follow KaliYuga's Twitter for the latest models and for notebooks with specialized settings.

Integration of OpenCLIP models and initiation of integration of KaliYuga models by Palmweaver / Chris Scalf (https://twitter.com/ChrisScalf11)

Integrated portrait_generator_v001 from Felipe3DArtist (https://twitter.com/Felipe3DArtist)

</details>

## Citation

```bibtex
@misc{github,
author={alembics},
title={disco-diffusion},
year={2022},
url={https://github.com/alembics/disco-diffusion},
}
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
unet = dict(
type='DenoisingUnet',
image_size=256,
in_channels=3,
base_channels=256,
resblocks_per_downsample=2,
attention_res=(32, 16, 8),
norm_cfg=dict(type='GN32', num_groups=32),
dropout=0.0,
num_classes=0,
use_fp16=True,
resblock_updown=True,
attention_cfg=dict(
type='MultiHeadAttentionBlock',
num_heads=4,
num_head_channels=64,
use_new_attention_order=False),
use_scale_shift_norm=True)

unet_ckpt_path = 'work_dirs/adm-cvt-rgb_finetuned_imagenet-256x256.pth' # noqa
plyfager marked this conversation as resolved.
Show resolved Hide resolved
secondary_model_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/secondary_model_imagenet_2.pth' # noqa
pretrained_cfgs = dict(
unet=dict(ckpt_path=unet_ckpt_path, prefix='unet'),
secondary_model=dict(ckpt_path=secondary_model_ckpt_path, prefix=''))

secondary_model = dict(type='SecondaryDiffusionImageNet2')

diffuser = dict(
type='DDIMScheduler',
variance_type='learned_range',
beta_schedule='linear',
clip_sample=False)

clip_models = [
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/32', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/16', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='RN50', jit=False)
]

model = dict(
type='DiscoDiffusion',
unet=unet,
diffuser=diffuser,
secondary_model=secondary_model,
clip_models=clip_models,
use_fp16=True,
pretrained_cfgs=pretrained_cfgs)
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
unet = dict(
type='DenoisingUnet',
image_size=512,
in_channels=3,
base_channels=256,
resblocks_per_downsample=2,
attention_res=(32, 16, 8),
norm_cfg=dict(type='GN32', num_groups=32),
dropout=0.0,
num_classes=0,
use_fp16=True,
resblock_updown=True,
attention_cfg=dict(
type='MultiHeadAttentionBlock',
num_heads=4,
num_head_channels=64,
use_new_attention_order=False),
use_scale_shift_norm=True)

unet_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u_finetuned_imagenet-512x512-ab471d70.pth' # noqa
secondary_model_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/secondary_model_imagenet_2.pth' # noqa
pretrained_cfgs = dict(
unet=dict(ckpt_path=unet_ckpt_path, prefix='unet'),
secondary_model=dict(ckpt_path=secondary_model_ckpt_path, prefix=''))

secondary_model = dict(type='SecondaryDiffusionImageNet2')

diffuser = dict(
type='DDIMScheduler',
variance_type='learned_range',
beta_schedule='linear',
clip_sample=False)

clip_models = [
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/32', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='ViT-B/16', jit=False),
dict(type='ClipWrapper', clip_type='clip', name='RN50', jit=False)
]

model = dict(
type='DiscoDiffusion',
unet=unet,
diffuser=diffuser,
secondary_model=secondary_model,
clip_models=clip_models,
use_fp16=True,
pretrained_cfgs=pretrained_cfgs)
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
_base_ = ['./disco-diffusion_adm-u-finetuned_imagenet-512x512.py']
unet_ckpt_path = 'https://download.openmmlab.com/mmediting/synthesizers/disco/adm-u-cvt-rgb_portrait-v001-f4a3f3bc.pth' # noqa
model = dict(
unet=dict(base_channels=128),
secondary_model=None,
pretrained_cfgs=dict(
_delete_=True, unet=dict(ckpt_path=unet_ckpt_path, prefix='unet')))
9 changes: 9 additions & 0 deletions configs/disco_diffusion/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
Collections:
- Metadata:
Architecture:
- Disco Diffusion
Name: Disco Diffusion
Paper:
- https://github.com/alembics/disco-diffusion
README: configs/disco_diffusion/README.md
Models: []
45 changes: 45 additions & 0 deletions configs/guided_diffusion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
# Guided Diffusion (NeurIPS'2021)

> [Diffusion Models Beat GANs on Image Synthesis](https://papers.nips.cc/paper/2021/file/49ad23d1ec9fa4bd8d77d02681df5cfa-Paper.pdf)

> **Task**: Image Generation

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

We show that diffusion models can achieve image sample quality superior to the current state-of-the-art generative models. We achieve this on unconditional image synthesis by finding a better architecture through a series of ablations. For conditional image synthesis, we further improve sample quality with classifier guidance: a simple, compute-efficient method for trading off diversity for fidelity using gradients from a classifier. We achieve an FID of 2.97 on ImageNet 128x128, 4.59 on ImageNet 256x256, and 7.72 on ImageNet 512x512, and we match BigGAN-deep even with as few as 25 forward passes per sample, all while maintaining better coverage of the distribution. Finally, we find that classifier guidance combines well with upsampling diffusion models, further improving FID to 3.94 on ImageNet 256x256 and 3.85 on ImageNet 512x512.

<!-- [IMAGE] -->

<div align=center >
<img src="https://user-images.githubusercontent.com/22982797/204706276-e340c545-3ec6-48bf-be21-58ed44e8a4df.jpg" width="400"/>
</div >

## Results and models

**ImageNet**

| Method | Resolution | Config | Weights |
| ------ | ---------- | ------------------------------------------------------------------ | ---------------------------------------------------------------------------------------------------------------------- |
| adm-u | 64x64 | [config](configs/guided_diffusion/adm-u_8xb32_imagenet-64x64.py) | [model](https://download.openmmlab.com/mmgen/guided_diffusion/adm-u-cvt-rgb_8xb32_imagenet-64x64-7ff0080b.pth) |
| adm-u | 512x512 | [config](configs/guided_diffusion/adm-u_8xb32_imagenet-512x512.py) | [model](https://openmmlab-share.oss-cn-hangzhou.aliyuncs.com/mmgen/guided_diffusion/adm-u_8xb32_imagenet-512x512-60b381cb.pth) |

**Note** To support disco diffusion, we support guided diffusion briefly. Complete support of guided diffusion with metrics and test/train logs will come soom!

## Quick Start

Coming soon!

## Citation

```bibtex
@article{PrafullaDhariwal2021DiffusionMB,
title={Diffusion Models Beat GANs on Image Synthesis},
author={Prafulla Dhariwal and Alex Nichol},
journal={arXiv: Learning},
year={2021}
}
```
Loading