Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add TencentARC PhotoMaker support #179

Merged
merged 65 commits into from
Mar 12, 2024
Merged
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
65 commits
Select commit Hold shift + click to select a range
5dbbea0
first efforts at implementing photomaker; lots more to do
Feb 2, 2024
cbfa702
added PhotoMakerIDEncoder model in SD
Feb 4, 2024
78651f2
fixed soem bugs; now photomaker model weights can be loaded into thei…
Feb 4, 2024
7da51ad
added input id image loading
Feb 5, 2024
702a732
added preprocessing inpit id images
Feb 5, 2024
7a7baef
finished get_num_tensors
Feb 5, 2024
f16b4da
fixed a bug in remove_duplicates
Feb 6, 2024
df7f642
add a get_learned_condition_with_trigger function to do photomaker stuff
Feb 6, 2024
ad7ec45
add a convert_token_to_id function for photomaker to extract trigger …
Feb 6, 2024
87fcee0
making progress; need to implement tokenizer decoder
Feb 7, 2024
7f5f580
making more progress; finishing vision model forward
Feb 7, 2024
0a38d84
debugging vision_model outputs
Feb 8, 2024
304704f
corrected clip vision model output
Feb 9, 2024
184e4b8
continue making progress in id fusion process
Feb 9, 2024
024c187
finished stacked id embedding; to be tested
Feb 9, 2024
a584147
remove garbage file
Feb 9, 2024
d939514
debuging graph compute
Feb 10, 2024
807c340
more progress; now alloc buffer failed
Feb 11, 2024
5a13b48
fixed wtype issue; input images can only be 1 because issue with tran…
Feb 11, 2024
f4bf8e0
added delayed subject conditioning; now photomaker runs and generates…
Feb 11, 2024
857af48
fixed stat_merge_step
Feb 12, 2024
b0579ec
added photomaker lora model (to be tested)
Feb 12, 2024
753231a
reworked pmid lora
Feb 12, 2024
44052e1
finished applying pmid lora; to be tested
Feb 13, 2024
c122507
finalized pmid lora
Feb 13, 2024
539a94a
add a few print tensor; tweak in sample again
Feb 14, 2024
26c5cca
small tweak; still not getting ID faces
Feb 15, 2024
191c5ec
fixed a bug in FuseBlock forward; also remove diag_mask op in for vis…
Feb 15, 2024
26f591d
disable pmid lora apply for now; 1 input image seems working; > 1 not…
Feb 16, 2024
62b0a9b
turn pmid lora apply back on
Feb 16, 2024
6064b0e
fixed a decode bug
Feb 16, 2024
d38b85d
fixed a bug in ggml's conv_2d, and now > 1 input images working
Feb 19, 2024
a6a676b
add style_ratio as a cli param; reworked encode with trigger for atte…
Feb 19, 2024
aa21577
merge commit fixing lora free param buffer error
Feb 19, 2024
78ec6f7
change default style ratio to 10%
Feb 19, 2024
8053aef
added an option to offload vae decoder to CPU for mem-limited gpus
Feb 21, 2024
dbe74df
removing image normalization step seems making ID fidelity much higher
Feb 23, 2024
e32462d
revert default style ratio back ro 20%
Feb 23, 2024
8b89e0e
added an option for normalizing input ID images; cleaned up debugging…
Feb 23, 2024
b7a4540
more clean up
Feb 23, 2024
53beb92
merged with master and resolved conflicts; adapted to GGMLBlock API
Feb 25, 2024
d35861b
merge with a master again; build ok
Feb 26, 2024
b556d44
fixed bugs; now failed with cuda error; likely out-of-mem on GPU
Feb 26, 2024
4ac3df1
free pmid model params when required
Feb 26, 2024
77b6a42
photomaker working properly now after merging and adapting to GGMLBlo…
Feb 26, 2024
464f1f8
remove tensor renaming; fixing names in the photomaker model file
Feb 26, 2024
4b31b75
updated README.md to include instructions and notes for running Photo…
Feb 26, 2024
7b72e17
a bit clean up
Feb 26, 2024
adbb9a1
remove -DGGML_CUDA_FORCE_MMQ; more clean up and README update
Feb 27, 2024
6079615
add input image requirement in README
Feb 27, 2024
fd098af
bring back freeing pmid lora params buffer; simply pooled output of C…
Feb 27, 2024
70c3397
remove MultiheadAttention2; customized MultiheadAttention
Feb 28, 2024
16c9da0
added a WIN32 get_files_from_dir; turn off Photomakder if receiving n…
Feb 28, 2024
41f20e6
update docs
leejet Mar 3, 2024
27887b6
fix ci error
leejet Mar 3, 2024
983e552
Merge branch 'master' into add-photomaker-support
leejet Mar 3, 2024
b0940f0
make stable-diffusion.h a pure c header file
leejet Mar 3, 2024
f8c0831
fix ci error
leejet Mar 3, 2024
745ed8f
format code
leejet Mar 3, 2024
6bb87cf
reuse get_learned_condition
leejet Mar 3, 2024
7e2c796
reuse pad_tokens
leejet Mar 3, 2024
9b3c8d8
reuse CLIPVisionModel
leejet Mar 9, 2024
df28af9
reuse LoraModel
leejet Mar 9, 2024
6727d1c
add --clip-on-cpu
leejet Mar 10, 2024
df2afd8
fix lora name conversion for SDXL
leejet Mar 10, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 40 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Inference of [Stable Diffusion](https://github.com/CompVis/stable-diffusion) in
- !!!The VAE in SDXL encounters NaN issues under FP16, but unfortunately, the ggml_conv_2d only operates under FP16. Hence, a parameter is needed to specify the VAE that has fixed the FP16 NaN issue. You can find it here: [SDXL VAE FP16 Fix](https://huggingface.co/madebyollin/sdxl-vae-fp16-fix/blob/main/sdxl_vae.safetensors).

- [SD-Turbo](https://huggingface.co/stabilityai/sd-turbo) and [SDXL-Turbo](https://huggingface.co/stabilityai/sdxl-turbo) support
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker) support.
- 16-bit, 32-bit float support
- 4-bit, 5-bit and 8-bit integer quantization support
- Accelerated memory-efficient CPU inference
Expand Down Expand Up @@ -151,7 +152,7 @@ cmake --build . --config Release
### Run

```
usage: ./build/bin/sd [arguments]
usage: ./bin/sd [arguments]

arguments:
-h, --help show this help message and exit
Expand All @@ -163,6 +164,9 @@ arguments:
--taesd [TAESD_PATH] path to taesd. Using Tiny AutoEncoder for fast decoding (low quality)
--control-net [CONTROL_PATH] path to control net model
--embd-dir [EMBEDDING_PATH] path to embeddings.
--stacked-id-embd-dir [DIR] path to PHOTOMAKER stacked id embeddings.
--input-id-images-dir [DIR] path to PHOTOMAKER input id images dir.
--normalize-input normalize PHOTOMAKER input id images
--upscale-model [ESRGAN_PATH] path to esrgan model. Upscale images after generate, just RealESRGAN_x4plus_anime_6B supported by now.
--upscale-repeats Run the ESRGAN upscaler this many times (default 1)
--type [TYPE] weight type (f32, f16, q4_0, q4_1, q5_0, q5_1, q8_0)
Expand All @@ -175,6 +179,7 @@ arguments:
-n, --negative-prompt PROMPT the negative prompt (default: "")
--cfg-scale SCALE unconditional guidance scale: (default: 7.0)
--strength STRENGTH strength for noising/unnoising (default: 0.75)
--style-ratio STYLE-RATIO strength for keeping input identity (default: 20%)
--control-strength STRENGTH strength to apply Control Net (default: 0.9)
1.0 corresponds to full destruction of information in init image
-H, --height H image height, in pixel space (default: 512)
Expand Down Expand Up @@ -299,6 +304,39 @@ You can use ESRGAN to upscale the generated images. At the moment, only the [Rea
sd -m ../models/v1-5-pruned-emaonly.safetensors -p "a lovely cat" --upscale-model ../models/RealESRGAN_x4plus_anime_6B.pth
```

#### Using PhotoMaker to personalize image generation

You can use [PhotoMaker](https://github.com/TencentARC/PhotoMaker) to personalize generated images with your own ID.

**NOTE**, currently PhotoMaker **ONLY** works with **SDXL** (any SDXL model files will work).

Download PhotoMaker model file (in safetensor format) [here](https://huggingface.co/bssrdf/PhotoMaker). The official release of the model file (in .bin format) does not work with ```stablediffusion.cpp```.

- Specify the PhotoMaker model path using the `--stacked-id-embd-dir PATH` parameter.
- Specify the input images path using the `--input-id-images-dir PATH` parameter.
- input images **must** have the same width and height for preprocessing (to be improved)

In prompt, make sure you have a class word followed by the trigger word ```"img"``` (hard-coded for now). The class word could be one of ```"man, woman, girl, boy"```. If input ID images contain asian faces, add ```Asian``` before the class
word.

Another PhotoMaker specific parameter:

- ```--style-ratio (0-100)%```: default is 20 and 10-20 typically gets good results. Lower ratio means more faithfully following input ID (not necessarily better quality).

Other parameters recommended for running Photomaker:

- ```--cfg-scale 5.0```
- ```-H 1024```
- ```-W 1024```

If on low memory GPUs (<= 8GB), recommend running with ```--vae-on-cpu``` option to get artifact free images.

Example:

```bash
bin/sd -m ../models/sdxlUnstableDiffusers_v11.safetensors --vae ../models/sdxl_vae.safetensors --stacked-id-embd-dir ../models/photomaker-v1.safetensors --input-id-images-dir ../assets/examples/scarletthead_woman -p "a girl img, retro futurism, retro game art style but extremely beautiful, intricate details, masterpiece, best quality, space-themed, cosmic, celestial, stars, galaxies, nebulas, planets, science fiction, highly detailed" -n "realistic, photo-realistic, worst quality, greyscale, bad anatomy, bad hands, error, text" --cfg-scale 5.0 --sampling-method euler -H 1024 -W 1024 --style-ratio 10 --vae-on-cpu -o output.png
```

### Docker

#### Building using Docker
Expand Down Expand Up @@ -345,3 +383,4 @@ Thank you to all the people who have already contributed to stable-diffusion.cpp
- [k-diffusion](https://github.com/crowsonkb/k-diffusion)
- [latent-consistency-model](https://github.com/luosiallen/latent-consistency-model)
- [generative-models](https://github.com/Stability-AI/generative-models/)
- [PhotoMaker](https://github.com/TencentARC/PhotoMaker)
Binary file added assets/photomaker_examples/lenna_woman/lenna.jpg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Loading