Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp2023-369] Adding support for FastComposer #2011

Merged
merged 79 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
feac9ba
1st
xiaomile Jun 2, 2023
b100239
debug
xiaomile Jun 21, 2023
b080671
20230710 调整
xiaomile Jul 10, 2023
bda8007
调整代码,整合模型,避免editors import 过多class
xiaomile Jul 17, 2023
e990639
调整代码,整合模型,避免editors import 过多class
xiaomile Jul 17, 2023
5f055e9
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Jul 24, 2023
02a0619
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
cbbea41
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
e836c69
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
0be4fde
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
cafc451
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
0a9e274
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
61aadee
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
80bf698
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
33c7751
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
2b0bbf2
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
ee28acf
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
eab4f43
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
6623669
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
1e91d3f
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
25d133d
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
b4ae7b8
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
f828049
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
320ca05
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
b933ba9
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
b1ac1df
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
d333d56
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
dcb5987
Merge branch 'main' into main
xiaomile Jul 27, 2023
76187c3
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
1d0ced9
Merge branch 'main' into main
zengyh1900 Jul 28, 2023
8a1dadc
Update .gitignore
xiaomile Jul 30, 2023
49a45cc
Update .gitignore
xiaomile Jul 30, 2023
2bdfedf
Update .gitignore
xiaomile Jul 30, 2023
8dbf240
Update .gitignore
xiaomile Jul 30, 2023
420de7e
Update .gitignore
xiaomile Jul 30, 2023
8809347
Update .gitignore
xiaomile Jul 30, 2023
b9fe117
Update .gitignore
xiaomile Jul 30, 2023
3bd19e5
Update configs/deblurganv2/README.md
xiaomile Jul 30, 2023
84f9592
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
64800e5
Merge branch 'main' into main
xiaomile Aug 2, 2023
b262ca6
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
d4fb484
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Aug 2, 2023
52fdc15
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
6856137
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
cb281d6
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
1211530
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
5de913e
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
8bd0803
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
f18bb29
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
8c97de1
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
f21e10f
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
98741ee
支持 DeblurGANv2 inference
xiaomile Aug 8, 2023
ce8a9b2
Merge branch 'open-mmlab:main' into main
xiaomile Aug 9, 2023
c2b4666
Merge branch 'open-mmlab:main' into main
xiaomile Aug 30, 2023
0fb88f2
Adding support for FastComposer
xiaomile Aug 30, 2023
4014991
Adding support for FastComposer
xiaomile Aug 31, 2023
21c0ce3
Adding support for FastComposer
xiaomile Aug 31, 2023
67d3bf3
Adding support for FastComposer
xiaomile Aug 31, 2023
7f51930
Adding support for FastComposer
xiaomile Aug 31, 2023
306cc83
Adding support for FastComposer
xiaomile Aug 31, 2023
1b47eae
Adding support for FastComposer
xiaomile Aug 31, 2023
b74e551
Adding support for FastComposer
xiaomile Aug 31, 2023
1ed33d2
Adding support for FastComposer
xiaomile Aug 31, 2023
6d0b8f9
Merge branch 'main' into main
xiaomile Sep 1, 2023
a987beb
Adding support for FastComposer
xiaomile Sep 1, 2023
69499a0
Merge branch 'main' into main
xiaomile Sep 1, 2023
254a71f
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Sep 1, 2023
12f16e0
Adding support for FastComposer
xiaomile Sep 1, 2023
38b7efa
Merge branch 'main' into main
xiaomile Sep 4, 2023
7c5b905
Merge branch 'main' into main
xiaomile Sep 5, 2023
8901eb1
Adding support for FastComposer
xiaomile Sep 5, 2023
8f9647e
Adding support for FastComposer
xiaomile Sep 5, 2023
f74ef3d
Adding support for FastComposer
xiaomile Sep 6, 2023
c2111da
Adding support for FastComposer
xiaomile Sep 6, 2023
9e3781e
Adding support for FastComposer
xiaomile Sep 6, 2023
a4b3491
Merge branch 'main' into main
xiaomile Sep 7, 2023
f845fdf
Adding support for FastComposer
xiaomile Sep 7, 2023
312c5f0
Adding support for FastComposer
xiaomile Sep 7, 2023
4bbacbb
Adding support for FastComposer
xiaomile Sep 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
117 changes: 117 additions & 0 deletions configs/fastcomposer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# FastComposer (2023)

> [FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention](https://arxiv.org/abs/2305.10431)

> **Task**: Text2Image

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300x-2500x speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation.

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/14927720/265914135-8a25789c-8d30-40cb-8ac5-e3bd3b617aac.png">
</div>

## Pretrained models

This model has several weights including vae, unet and clip. You should download the weights from [stable-diffusion-1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [clipModel](https://huggingface.co/openai/clip-vit-large-patch14),and then change the 'stable_diffusion_v15_url' and 'clip_vit_url' in config to the corresponding weights path and "finetuned_model_path" to the weight path of fastcomposer.

| Model | Dataset | Download |
| :------------------------------------------: | :-----: | :---------------------------------------------------------------------------------------------: |
| [FastComposer](./fastcomposer_8xb16_FFHQ.py) | - | [model](https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin) |

## Quick Start

xiaomile marked this conversation as resolved.
Show resolved Hide resolved
You can run the demo locally by

```bash
python demo/gradio_fastcomposer.py
```

Or running the following codes, you can get a text-generated image.

```python
import numpy as np
import mmcv
from mmengine import Config
from PIL import Image

from mmagic.registry import MODELS
from mmagic.utils import register_all_modules
import torch, gc

gc.collect()
torch.cuda.empty_cache()

register_all_modules()

cfg_file = Config.fromfile('configs/fastcomposer/fastcomposer_8xb16_FFHQ.py')

fastcomposer = MODELS.build(cfg_file.model).cuda()

prompt = "A man img and a man img sitting in a park"
negative_prompt = "((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))"
alpha_ = 0.75
guidance_scale = 5
num_steps = 50
num_images = 1
image = []
seed = -1

image1 = mmcv.imread('https://user-images.githubusercontent.com/14927720/265911400-91635451-54b6-4dc6-92a7-c1d02f88b62e.jpeg')
image2 = mmcv.imread('https://user-images.githubusercontent.com/14927720/265911502-66b67f53-dff0-4d25-a9af-3330e446aa48.jpeg')

image.append(Image.fromarray(image1))
image.append(Image.fromarray(image2))

if len(image) == 0:
raise Exception("You need to upload at least one image.")

num_subject_in_text = (
np.array(fastcomposer.special_tokenizer.encode(prompt))
== fastcomposer.image_token_id
).sum()
if num_subject_in_text != len(image):
raise Exception(f"Number of subjects in the text description doesn't match the number of reference images, #text subjects: {num_subject_in_text} #reference image: {len(image)}",
)

if seed == -1:
seed = np.random.randint(0, 1000000)

device = torch.device('cuda' if torch.cuda.is_available(
) else 'cpu')
generator = torch.Generator(device=device)
generator.manual_seed(seed)

xiaomile marked this conversation as resolved.
Show resolved Hide resolved
output_dict = fastcomposer.infer(prompt,
negative_prompt=negative_prompt,
height=512,
width=512,
num_inference_steps=num_steps,
guidance_scale=guidance_scale,
num_images_per_prompt=num_images,
generator=generator,
alpha_=alpha_,
reference_subject_images=image)

samples = output_dict['samples']
for idx, sample in enumerate(samples):
sample.save(f'sample_{idx}.png')
```

## Citation

```bibtex
@article{xiao2023fastcomposer,
title={FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention},
author={Xiao, Guangxuan and Yin, Tianwei and Freeman, William T. and Durand, Frédo and Han, Song},
journal={arXiv},
year={2023}
}
```
118 changes: 118 additions & 0 deletions configs/fastcomposer/README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,118 @@
# FastComposer (2023)

> [FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention](https://arxiv.org/abs/2305.10431)

> **任务**: 文本转图像

<!-- [ALGORITHM] -->

## 摘要

<!-- [ABSTRACT] -->

扩散模型在文本到图像生成方面表现出色,尤其在以主题驱动的个性化图像生成方面。然而,现有方法由于主题特定的微调而效率低下,因为需要大量的计算资源,而这限制了扩散模型高效部署的可能性。此外,现有方法在多主题生成方面存在困难,因为它们经常在不同主题之间混合特征。因此我们提出了FastComposer,它可以实现高效、个性化、多主题的文本到图像生成,而无需进行微调。FastComposer利用图像编码器提取的主题嵌入来增强扩散模型中的通用文本条件,只需进行前向传递即可基于主题图像和文本指令进行个性化图像生成。为了解决多主题生成中的身份混合问题,FastComposer在训练过程中提出了交叉注意力定位监督,强制参考主题的注意力定位于目标图像中的正确区域。简单地基于主题嵌入进行条件设定会导致主题过度拟合的问题。为了在以主题驱动的图像生成中同时保持身份和可编辑性,FastComposer在去噪步骤中提出了延迟主题条件设定的方法。FastComposer可以生成具有不同风格、动作和背景的多个未知个体的图像。与基于微调的方法相比,它实现了300倍到2500倍的加速,并且对于新主题不需要额外的存储空间。正因如此FastComposer为高效、个性化和高质量的多主题图像创作铺平了道路。

<!-- [IMAGE] -->

<div align=center>
<img src="https://user-images.githubusercontent.com/14927720/265914135-8a25789c-8d30-40cb-8ac5-e3bd3b617aac.png">
</div>

## 预训练模型

该模型有几个权重,包括vae,unet和clip。您应该先从[stable-diffusion-1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) 和 [clipModel](https://huggingface.co/openai/clip-vit-large-patch14) 下载权重,然后将配置中的“stable_diffusion_v15_url”和”clip_vit_url“更改为对应的权重路径,将”finetuned_model_path“更改为fastcomposer的权重路径。

| Model | Dataset | Download |
| :------------------------------------------: | :-----: | :---------------------------------------------------------------------------------------------: |
| [FastComposer](./fastcomposer_8xb16_FFHQ.py) | - | [model](https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin) |

## 快速开始

您可以通过以下方式在本地运行来演示

```bash
python demo/gradio_fastcomposer.py
```

或者运行一下代码,您就能获得依照文本生成的特定图像。

```python
import numpy as np
import mmcv
from mmengine import Config
from PIL import Image

from mmagic.registry import MODELS
from mmagic.utils import register_all_modules
import torch, gc

gc.collect()
torch.cuda.empty_cache()

register_all_modules()

cfg_file = Config.fromfile('configs/fastcomposer/fastcomposer_8xb16_FFHQ.py')

fastcomposer = MODELS.build(cfg_file.model).cuda()

prompt = "A man img and a man img sitting in a park"
negative_prompt = "((((ugly)))), (((duplicate))), ((morbid)), ((mutilated)), [out of frame], extra fingers, mutated hands, ((poorly drawn hands)), ((poorly drawn face)), (((mutation))), (((deformed))), ((ugly)), blurry, ((bad anatomy)), (((bad proportions))), ((extra limbs)), cloned face, (((disfigured))). out of frame, ugly, extra limbs, (bad anatomy), gross proportions, (malformed limbs), ((missing arms)), ((missing legs)), (((extra arms))), (((extra legs))), mutated hands, (fused fingers), (too many fingers), (((long neck)))"
alpha_ = 0.75
guidance_scale = 5
num_steps = 50
num_images = 1
image = []
seed = -1

image1 = mmcv.imread('https://user-images.githubusercontent.com/14927720/265911400-91635451-54b6-4dc6-92a7-c1d02f88b62e.jpeg')
image2 = mmcv.imread('https://user-images.githubusercontent.com/14927720/265911502-66b67f53-dff0-4d25-a9af-3330e446aa48.jpeg')

image.append(Image.fromarray(image1))

image.append(Image.fromarray(image2))

if len(image) == 0:
raise Exception("You need to upload at least one image.")

num_subject_in_text = (
np.array(fastcomposer.special_tokenizer.encode(prompt))
== fastcomposer.image_token_id
).sum()
if num_subject_in_text != len(image):
raise Exception(f"Number of subjects in the text description doesn't match the number of reference images, #text subjects: {num_subject_in_text} #reference image: {len(image)}",
)

if seed == -1:
seed = np.random.randint(0, 1000000)

device = torch.device('cuda' if torch.cuda.is_available(
) else 'cpu')
generator = torch.Generator(device=device)
generator.manual_seed(seed)

output_dict = fastcomposer.infer(prompt,
negative_prompt=negative_prompt,
height=512,
width=512,
num_inference_steps=num_steps,
guidance_scale=guidance_scale,
num_images_per_prompt=num_images,
generator=generator,
alpha_=alpha_,
reference_subject_images=image)

samples = output_dict['samples']
for idx, sample in enumerate(samples):
sample.save(f'sample_{idx}.png')
```

## 引用

```bibtex
@article{xiao2023fastcomposer,
title={FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention},
author={Xiao, Guangxuan and Yin, Tianwei and Freeman, William T. and Durand, Frédo and Han, Song},
journal={arXiv},
year={2023}
}
```
50 changes: 50 additions & 0 deletions configs/fastcomposer/fastcomposer_8xb16_FFHQ.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
_base_ = '../_base_/gen_default_runtime.py'

# config for model
stable_diffusion_v15_url = 'runwayml/stable-diffusion-v1-5'
clip_vit_url = 'openai/clip-vit-large-patch14'
finetuned_model_path = 'https://download.openxlab.org.cn/models/xiaomile/'\
'fastcomposer/weight/pytorch_model.bin'

model = dict(
type='FastComposer',
vae=dict(
type='AutoencoderKL',
from_pretrained=stable_diffusion_v15_url,
subfolder='vae'),
unet=dict(
type='UNet2DConditionModel',
subfolder='unet',
from_pretrained=stable_diffusion_v15_url),
text_encoder=dict(
type='ClipWrapper',
clip_type='huggingface',
pretrained_model_name_or_path=stable_diffusion_v15_url,
subfolder='text_encoder'),
tokenizer=stable_diffusion_v15_url,
pretrained_cfg=dict(
finetuned_model_path=finetuned_model_path,
enable_xformers_memory_efficient_attention=None,
pretrained_model_name_or_path=stable_diffusion_v15_url,
image_encoder=clip_vit_url,
revision=None,
non_ema_revision=None,
object_localization=None,
object_localization_weight=0.01,
localization_layers=5,
mask_loss=None,
mask_loss_prob=0.5,
object_localization_threshold=1.0,
object_localization_normalize=None,
no_object_augmentation=True,
object_resolution=256),
scheduler=dict(
type='DDPMScheduler',
from_pretrained=stable_diffusion_v15_url,
subfolder='scheduler'),
test_scheduler=dict(
type='DDIMScheduler',
from_pretrained=stable_diffusion_v15_url,
subfolder='scheduler'),
dtype='fp32',
data_preprocessor=dict(type='DataPreprocessor'))
19 changes: 19 additions & 0 deletions configs/fastcomposer/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Collections:
- Name: FastComposer
Paper:
Title: 'FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
Attention'
URL: https://arxiv.org/abs/2305.10431
README: configs/fastcomposer/README.md
Task:
- text2image
Year: 2023
Models:
- Config: configs/fastcomposer/fastcomposer_8xb16_FFHQ.py
In Collection: FastComposer
Name: fastcomposer_8xb16_FFHQ
Results:
- Dataset: '-'
Metrics: {}
Task: Text2Image
Weights: https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin
37 changes: 37 additions & 0 deletions demo/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,3 +351,40 @@ Then launch the UI and you can use the pretrained weights to generate images.
2. (Optional) Customize advanced settings.

3. Click inference button.

#### 3.1.3 FastComposer

First, run the script:

```shell
python demo/gradio_fastcomposer.py
```

Second, upload reference subject images.For example,

<table align="center">
<thead>
<tr>
<td>
<div align="center">
<img src="https://user-images.githubusercontent.com/14927720/265911400-91635451-54b6-4dc6-92a7-c1d02f88b62e.jpeg" width="400"/>
<br/>
<b>'reference_0.png'</b>
</div></td>
<td>
<div align="center">
<img src="https://user-images.githubusercontent.com/14927720/265911502-66b67f53-dff0-4d25-a9af-3330e446aa48.jpeg" width="400"/>
<br/>
<b>'reference_1.png'</b>
</div></td>
<td>
</thead>
</table>

Then, add prompt like `A man img and a man img sitting together` and press `run` button.

Finally, you can get text-generated images.

<div align=center>
<img src="https://user-images.githubusercontent.com/14927720/265911526-4975d6e2-c5fc-4324-80c9-a7a512953218.png">
</div>
Loading