Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CodeCamp2023-369] Adding support for FastComposer #2011

Merged
merged 79 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from 64 commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
feac9ba
1st
xiaomile Jun 2, 2023
b100239
debug
xiaomile Jun 21, 2023
b080671
20230710 调整
xiaomile Jul 10, 2023
bda8007
调整代码,整合模型,避免editors import 过多class
xiaomile Jul 17, 2023
e990639
调整代码,整合模型,避免editors import 过多class
xiaomile Jul 17, 2023
5f055e9
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Jul 24, 2023
02a0619
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
cbbea41
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
e836c69
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
0be4fde
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
cafc451
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
0a9e274
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
61aadee
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
80bf698
支持 DeblurGANv2 inference
xiaomile Jul 24, 2023
33c7751
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
2b0bbf2
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
ee28acf
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
eab4f43
支持 DeblurGANv2 inference
xiaomile Jul 25, 2023
6623669
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
1e91d3f
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
25d133d
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
b4ae7b8
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
f828049
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
320ca05
支持 DeblurGANv2 inference
xiaomile Jul 26, 2023
b933ba9
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
b1ac1df
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
d333d56
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
dcb5987
Merge branch 'main' into main
xiaomile Jul 27, 2023
76187c3
支持 DeblurGANv2 inference
xiaomile Jul 27, 2023
1d0ced9
Merge branch 'main' into main
zengyh1900 Jul 28, 2023
8a1dadc
Update .gitignore
xiaomile Jul 30, 2023
49a45cc
Update .gitignore
xiaomile Jul 30, 2023
2bdfedf
Update .gitignore
xiaomile Jul 30, 2023
8dbf240
Update .gitignore
xiaomile Jul 30, 2023
420de7e
Update .gitignore
xiaomile Jul 30, 2023
8809347
Update .gitignore
xiaomile Jul 30, 2023
b9fe117
Update .gitignore
xiaomile Jul 30, 2023
3bd19e5
Update configs/deblurganv2/README.md
xiaomile Jul 30, 2023
84f9592
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
64800e5
Merge branch 'main' into main
xiaomile Aug 2, 2023
b262ca6
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
d4fb484
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Aug 2, 2023
52fdc15
支持 DeblurGANv2 inference
xiaomile Aug 2, 2023
6856137
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
cb281d6
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
1211530
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
5de913e
Update configs/deblurganv2/deblurganv2_fpn-inception_1xb1_gopro.py
xiaomile Aug 7, 2023
8bd0803
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
f18bb29
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
8c97de1
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
f21e10f
Update configs/deblurganv2/deblurganv2_fpn-mobilenet_1xb1_gopro.py
xiaomile Aug 7, 2023
98741ee
支持 DeblurGANv2 inference
xiaomile Aug 8, 2023
ce8a9b2
Merge branch 'open-mmlab:main' into main
xiaomile Aug 9, 2023
c2b4666
Merge branch 'open-mmlab:main' into main
xiaomile Aug 30, 2023
0fb88f2
Adding support for FastComposer
xiaomile Aug 30, 2023
4014991
Adding support for FastComposer
xiaomile Aug 31, 2023
21c0ce3
Adding support for FastComposer
xiaomile Aug 31, 2023
67d3bf3
Adding support for FastComposer
xiaomile Aug 31, 2023
7f51930
Adding support for FastComposer
xiaomile Aug 31, 2023
306cc83
Adding support for FastComposer
xiaomile Aug 31, 2023
1b47eae
Adding support for FastComposer
xiaomile Aug 31, 2023
b74e551
Adding support for FastComposer
xiaomile Aug 31, 2023
1ed33d2
Adding support for FastComposer
xiaomile Aug 31, 2023
6d0b8f9
Merge branch 'main' into main
xiaomile Sep 1, 2023
a987beb
Adding support for FastComposer
xiaomile Sep 1, 2023
69499a0
Merge branch 'main' into main
xiaomile Sep 1, 2023
254a71f
Merge branch 'main' of https://github.com/xiaomile/mmagic
xiaomile Sep 1, 2023
12f16e0
Adding support for FastComposer
xiaomile Sep 1, 2023
38b7efa
Merge branch 'main' into main
xiaomile Sep 4, 2023
7c5b905
Merge branch 'main' into main
xiaomile Sep 5, 2023
8901eb1
Adding support for FastComposer
xiaomile Sep 5, 2023
8f9647e
Adding support for FastComposer
xiaomile Sep 5, 2023
f74ef3d
Adding support for FastComposer
xiaomile Sep 6, 2023
c2111da
Adding support for FastComposer
xiaomile Sep 6, 2023
9e3781e
Adding support for FastComposer
xiaomile Sep 6, 2023
a4b3491
Merge branch 'main' into main
xiaomile Sep 7, 2023
f845fdf
Adding support for FastComposer
xiaomile Sep 7, 2023
312c5f0
Adding support for FastComposer
xiaomile Sep 7, 2023
4bbacbb
Adding support for FastComposer
xiaomile Sep 8, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
46 changes: 46 additions & 0 deletions configs/fastcomposer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# FastComposer (2023)

> [FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention](https://arxiv.org/abs/2305.10431)

> **Task**: Text2Image

<!-- [ALGORITHM] -->

## Abstract

<!-- [ABSTRACT] -->

Diffusion models excel at text-to-image generation, especially in subject-driven generation for personalized images. However, existing methods are inefficient due to the subject-specific fine-tuning, which is computationally intensive and hampers efficient deployment. Moreover, existing methods struggle with multi-subject generation as they often blend features among subjects. We present FastComposer which enables efficient, personalized, multi-subject text-to-image generation without fine-tuning. FastComposer uses subject embeddings extracted by an image encoder to augment the generic text conditioning in diffusion models, enabling personalized image generation based on subject images and textual instructions with only forward passes. To address the identity blending problem in the multi-subject generation, FastComposer proposes cross-attention localization supervision during training, enforcing the attention of reference subjects localized to the correct regions in the target images. Naively conditioning on subject embeddings results in subject overfitting. FastComposer proposes delayed subject conditioning in the denoising step to maintain both identity and editability in subject-driven image generation. FastComposer generates images of multiple unseen individuals with different styles, actions, and contexts. It achieves 300x-2500x speedup compared to fine-tuning-based methods and requires zero extra storage for new subjects. FastComposer paves the way for efficient, personalized, and high-quality multi-subject image creation.

<!-- [IMAGE] -->

<div align=center>
<img src="https://fastcomposer.mit.edu/figures/multi_subject.png">
</div>

## Pretrained models

This model has several weights including vae, unet and clip. You should download the weights from [stable-diffusion-1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) and [clipModel](https://huggingface.co/openai/clip-vit-large-patch14),and then change the 'stable_diffusion_v15_url' and 'clip_vit_url' in config to the corresponding weights path and "finetuned_model_path" to the weight path of fastcomposer.

| Model | Dataset | Download |
| :-----------------------------------------: | :-----: | :---------------------------------------------------------------------------------------------: |
| [FastComposer](./fastcomposer_8xb1_FFHQ.py) | - | [model](https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin) |

## Quick Start

xiaomile marked this conversation as resolved.
Show resolved Hide resolved
You can run the demo locally by

```bash
python demo/gradio_fastcomposer.py
```

## Citation

```bibtex
@article{xiao2023fastcomposer,
title={FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention},
author={Xiao, Guangxuan and Yin, Tianwei and Freeman, William T. and Durand, Frédo and Han, Song},
journal={arXiv},
year={2023}
}
```
46 changes: 46 additions & 0 deletions configs/fastcomposer/README_zh-CN.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# FastComposer (2023)

> [FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention](https://arxiv.org/abs/2305.10431)

> **任务**: 文本转图像

<!-- [ALGORITHM] -->

## 摘要

<!-- [ABSTRACT] -->

扩散模型在文本到图像生成方面表现出色,尤其在以主题驱动的个性化图像生成方面。然而,现有方法由于主题特定的微调而效率低下,因为需要大量的计算资源,而这限制了扩散模型高效部署的可能性。此外,现有方法在多主题生成方面存在困难,因为它们经常在不同主题之间混合特征。因此我们提出了FastComposer,它可以实现高效、个性化、多主题的文本到图像生成,而无需进行微调。FastComposer利用图像编码器提取的主题嵌入来增强扩散模型中的通用文本条件,只需进行前向传递即可基于主题图像和文本指令进行个性化图像生成。为了解决多主题生成中的身份混合问题,FastComposer在训练过程中提出了交叉注意力定位监督,强制参考主题的注意力定位于目标图像中的正确区域。简单地基于主题嵌入进行条件设定会导致主题过度拟合的问题。为了在以主题驱动的图像生成中同时保持身份和可编辑性,FastComposer在去噪步骤中提出了延迟主题条件设定的方法。FastComposer可以生成具有不同风格、动作和背景的多个未知个体的图像。与基于微调的方法相比,它实现了300倍到2500倍的加速,并且对于新主题不需要额外的存储空间。正因如此FastComposer为高效、个性化和高质量的多主题图像创作铺平了道路。

<!-- [IMAGE] -->

<div align=center>
<img src="https://fastcomposer.mit.edu/figures/multi_subject.png">
</div>

## 预训练模型

该模型有几个权重,包括vae,unet和clip。您应该先从[stable-diffusion-1.5](https://huggingface.co/runwayml/stable-diffusion-v1-5) 和 [clipModel](https://huggingface.co/openai/clip-vit-large-patch14) 下载权重,然后将配置中的“stable_diffusion_v15_url”和”clip_vit_url“更改为对应的权重路径,将”finetuned_model_path“更改为fastcomposer的权重路径。

| Model | Dataset | Download |
| :-----------------------------------------: | :-----: | :---------------------------------------------------------------------------------------------: |
| [FastComposer](./fastcomposer_8xb1_FFHQ.py) | - | [model](https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin) |

## 快速开始

您可以通过以下方式在本地运行来演示

```bash
python demo/gradio_fastcomposer.py
```

## 引用

```bibtex
@article{xiao2023fastcomposer,
title={FastComposer: Tuning-Free Multi-Subject Image Generation with Localized Attention},
author={Xiao, Guangxuan and Yin, Tianwei and Freeman, William T. and Durand, Frédo and Han, Song},
journal={arXiv},
year={2023}
}
```
50 changes: 50 additions & 0 deletions configs/fastcomposer/fastcomposer_8xb1_FFHQ.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
_base_ = '../_base_/gen_default_runtime.py'

# config for model
stable_diffusion_v15_url = 'runwayml/stable-diffusion-v1-5'
clip_vit_url = 'openai/clip-vit-large-patch14'
finetuned_model_path = 'https://download.openxlab.org.cn/models/xiaomile/'\
'fastcomposer/weight/pytorch_model.bin'

model = dict(
type='FastComposer',
vae=dict(
type='AutoencoderKL',
from_pretrained=stable_diffusion_v15_url,
subfolder='vae'),
unet=dict(
type='UNet2DConditionModel',
subfolder='unet',
from_pretrained=stable_diffusion_v15_url),
text_encoder=dict(
type='ClipWrapper',
clip_type='huggingface',
pretrained_model_name_or_path=stable_diffusion_v15_url,
subfolder='text_encoder'),
tokenizer=stable_diffusion_v15_url,
pretrained_cfg=dict(
finetuned_model_path=finetuned_model_path,
enable_xformers_memory_efficient_attention=None,
pretrained_model_name_or_path=stable_diffusion_v15_url,
image_encoder_name_or_path=clip_vit_url,
revision=None,
non_ema_revision=None,
object_localization=None,
object_localization_weight=0.01,
localization_layers=5,
mask_loss=None,
mask_loss_prob=0.5,
object_localization_threshold=1.0,
object_localization_normalize=None,
no_object_augmentation=True,
object_resolution=256),
scheduler=dict(
type='DDPMScheduler',
from_pretrained=stable_diffusion_v15_url,
subfolder='scheduler'),
test_scheduler=dict(
type='DDIMScheduler',
from_pretrained=stable_diffusion_v15_url,
subfolder='scheduler'),
dtype='fp32',
data_preprocessor=dict(type='DataPreprocessor'))
19 changes: 19 additions & 0 deletions configs/fastcomposer/metafile.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
Collections:
- Name: FastComposer
Paper:
Title: 'FastComposer: Tuning-Free Multi-Subject Image Generation with Localized
Attention'
URL: https://arxiv.org/abs/2305.10431
README: configs/fastcomposer/README.md
Task:
- text2image
Year: 2023
Models:
- Config: configs/fastcomposer/fastcomposer_8xb1_FFHQ.py
In Collection: FastComposer
Name: fastcomposer_8xb1_FFHQ
Results:
- Dataset: '-'
Metrics: {}
Task: Text2Image
Weights: https://download.openxlab.org.cn/models/xiaomile/fastcomposer/weight/pytorch_model.bin
208 changes: 208 additions & 0 deletions demo/gradio_fastcomposer.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,208 @@
# Copyright (c) OpenMMLab. All rights reserved.
xiaomile marked this conversation as resolved.
Show resolved Hide resolved
import gc

import gradio as gr
import numpy as np
import PIL
import torch
from mmengine import Config

from mmagic.registry import MODELS
from mmagic.utils import register_all_modules

gc.collect()
torch.cuda.empty_cache()
register_all_modules()


class ModelWrapper:

def __init__(self, model):
super().__init__()
self.model = model
xiaomile marked this conversation as resolved.
Show resolved Hide resolved

def inference(
self,
image1: PIL.Image.Image,
image2: PIL.Image.Image,
prompt: str,
negative_prompt: str,
seed: int,
guidance_scale: float,
alpha_: float,
num_steps: int,
num_images: int,
):
print('Running model inference...')
image = []
if image1 is not None:
image.append(image1)

if image2 is not None:
image.append(image2)

if len(image) == 0:
return [], 'You need to upload at least one image.'

num_subject_in_text = (np.array(
self.model.special_tokenizer.encode(prompt)) ==
self.model.image_token_id).sum()
if num_subject_in_text != len(image):
return (
[],
"Number of subjects in the text description doesn't "
'match the number of reference images, #text subjects: '
f'{num_subject_in_text} #reference image: {len(image)}',
)

if seed == -1:
seed = np.random.randint(0, 1000000)

generator = torch.manual_seed(seed)

xiaomile marked this conversation as resolved.
Show resolved Hide resolved
return (
self.model.infer(
prompt=prompt,
negative_prompt=negative_prompt,
height=512,
width=512,
num_inference_steps=num_steps,
guidance_scale=guidance_scale,
num_images_per_prompt=num_images,
generator=generator,
alpha_=alpha_,
reference_subject_images=image,
)['samples'],
'run successfully',
)


def create_demo():
TITLE = 'FastComposer Demo'

DESCRIPTION = """To run the demo, you should:
1. Upload your images. The order of image1 and image2 needs to match the
order of the subects in the prompt. You only need 1 image for
single subject generation.
2. Input proper text prompts, such as "A woman img and a man img in
the snow" or "A painting of a man img in the style of Van Gogh",
where "img" specifies the token you want to augment and comes
after the word.
3. Click the Run button. You can also adjust the hyperparameters to
improve the results. Look at the job status to see
if there are any errors with your input.
"""

cfg_file = Config.fromfile(
'configs/fastcomposer/fastcomposer_8xb1_FFHQ.py')
fastcomposer = MODELS.build(cfg_file.model)
model = ModelWrapper(fastcomposer)
# model = ModelWrapper(convert_model_to_pipeline(args, "cpu"))
with gr.Blocks() as demo:
gr.Markdown(TITLE)
gr.Markdown(DESCRIPTION)
with gr.Row():
with gr.Column():
with gr.Box():
image1 = gr.Image(label='Image 1', type='pil')

image2 = gr.Image(label='Image 2', type='pil')

gr.Markdown('Upload the image for your subject')

prompt = gr.Text(
value='A man img and a man img sitting in a park',
label='Prompt',
placeholder=
'e.g. "A woman img and a man img in the snow", "A painting'
' of a man img in the style of Van Gogh"',
info='Use "img" to specify the word you want to augment.',
)
negative_prompt = gr.Text(
value='((((ugly)))), (((duplicate))), ((morbid)), '
'((mutilated)), [out of frame], extra fingers, '
'mutated hands, ((poorly drawn hands)), '
'((poorly drawn face)), (((mutation))), '
'(((deformed))), ((ugly)), blurry, ((bad anatomy)), '
'(((bad proportions))), ((extra limbs)), cloned face, '
'(((disfigured))). out of frame, ugly, extra limbs, '
'(bad anatomy), gross proportions, (malformed limbs), '
'((missing arms)), ((missing legs)), (((extra arms))), '
'(((extra legs))), mutated hands, (fused fingers), '
'(too many fingers), (((long neck)))',
label='Negative Prompt',
info='Features that you want to avoid.',
)
alpha_ = gr.Slider(
label='alpha',
minimum=0,
maximum=1,
step=0.05,
value=0.75,
info=
'A smaller alpha aligns images with text better, but may '
'deviate from the subject image. Increase alpha to improve'
' identity preservation, decrease it for '
'prompt consistency.',
)
num_images = gr.Slider(
label='Number of generated images',
minimum=1,
maximum=8,
step=1,
value=4,
)
run_button = gr.Button('Run')
with gr.Accordion(label='Advanced options', open=False):
seed = gr.Slider(
label='Seed',
minimum=-1,
maximum=1000000,
step=1,
value=-1,
info='If set to -1, a different seed will be '
'used each time.',
)
guidance_scale = gr.Slider(
label='Guidance scale',
minimum=1,
maximum=10,
step=1,
value=5,
)
num_steps = gr.Slider(
label='Steps',
minimum=1,
maximum=300,
step=1,
value=50,
)
with gr.Column():
result = gr.Gallery(label='Generated Images').style(
grid=[2], height='auto')
error_message = gr.Text(label='Job Status')

inputs = [
image1,
image2,
prompt,
negative_prompt,
seed,
guidance_scale,
alpha_,
num_steps,
num_images,
]
run_button.click(
fn=model.inference, inputs=inputs, outputs=[result, error_message])
return demo


if __name__ == '__main__':
demo = create_demo()
demo.queue(api_open=False)
demo.launch(
show_error=True,
server_name='0.0.0.0',
server_port=8080,
)
1 change: 1 addition & 0 deletions mmagic/apis/mmagic_inferencer.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,6 +109,7 @@ class MMagicInferencer:
# text2image models
'controlnet',
'disco_diffusion',
'fastcomposer',
xiaomile marked this conversation as resolved.
Show resolved Hide resolved
'stable_diffusion',

# 3D-aware generation
Expand Down
3 changes: 2 additions & 1 deletion mmagic/models/editors/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@
from .edvr import EDVR, EDVRNet
from .eg3d import EG3D
from .esrgan import ESRGAN, RRDBNet
from .fastcomposer import FastComposer
from .fba import FBADecoder, FBAResnetDilated
from .flavr import FLAVR, FLAVRNet
from .gca import GCA
Expand Down Expand Up @@ -93,5 +94,5 @@
'ClipWrapper', 'EG3D', 'Restormer', 'SwinIRNet', 'StableDiffusion',
'ControlStableDiffusion', 'DreamBooth', 'TextualInversion', 'DeblurGanV2',
'DeblurGanV2Generator', 'DeblurGanV2Discriminator',
'StableDiffusionInpaint'
'StableDiffusionInpaint', 'FastComposer'
]
Loading