Skip to content

Latest commit

 

History

History
69 lines (42 loc) · 4.18 KB

README.md

File metadata and controls

69 lines (42 loc) · 4.18 KB

MLImgSynth

Generate images using Stable Diffusion (SD) models. This program is completely written in C and uses the GGML library. It is largely based in stable-diffusion.cpp, but with a focus in more concise and clear code. Also, I put some care in the memory usage: at each step only the required weights will be loaded in the backend memory (e.g. VRAM). Moreover, with the options --unet-split and --vae-tile it is possible to run SDXL models using only 4 GiB without quantization.

Supported models

Besides the original weights, you may use any of the fine-tuned checkpoints that can be found on the internet. Destilled models (turbo, hyper, lightning) should work normally.

Usage on Windows

Download and unzip the latest release. Edit the file generate.bat as needed and execute it.

Build

First you must build ggml as library with the desired backends and then build this program linking to it. You may symlink the ggml directory to root of this project or define the GGML_INCLUDE_PATH and GGML_LIB_PATH variables. Finally, just call make. For example:

export GGML_INCLUDE_PATH=../ggml/include
export GGML_LIB_PATH=../ggml/Release/src
make

By default, the program is linked with libpng and libjpeg to support those formats. You may suppress these dependencies defining MLIS_NO_PNG and MLIS_NO_JPEG. The PNM image format is always available.

Usage

First, download the weights of the model you wish to use. Right now, the only supported format is safetensors. To generate an image (txt2img) use:

./mlimgsynth generate -b Vulkan0 -m MODEL.safetensors --cfg-scale 7 --steps 20 --seed 42 -o output.png -p "a box on a table"

The option -b let's you select from the available backends. Use Vulkan0 or CUDA0 for GPU. By default CPU is used.

See the script generate.sh for a more complete example.

Execute without any arguments to see a list of all the supported options.

img2img and inpainting

To start from an initial image (img2img) add the options -i IMAGE.png and --f-t-ini 0.7. The second option controls the strength by changing the initial time in the denoising process, you may try any value between 0 (no changes) and 1.

If the image has an alpha channel (transparency), it is used as a mask for inpainting.

Lora's

Lora's can be loaded indivually with the option --lora PATH:MULT or with the option --lora-dir PATH and adding to the prompt <lora:NAME:MULT>. In the last case, it will look for the file PATH/NAME.safetensors.

TAE

To accelerate and reduce the memory usage during the image decoding, you may use the TAE (tiny autoencoder) in place of the VAE (variational autoencoder) of SD. Download the weights compatible with SD or SDXL, and pass the path to them with the option --tae TAE.safetensors to enable it. Be warned that this reduces the final images quality. If you are low on memory, it is preferable to use the --vae-tile 512 option.

Future plans

  • Allow non-ascii characters in the prompt.
  • Parse parenthesis in the prompt to change relative weights, e.g. a (large) dog.
  • Support for GGUF and quantized models.
  • ControlNet
  • Maybe SDE sampling. The biggest hurdle is understanding what it is doing the torchsde.BrownianTree used in k-diffusion.

License

Most of this program is licensed under the MIT (see the file LICENSE), with the exceptions of the files in the directory src/ccommon which use the ZLib license (see the file LICENSE.zlib). To prevent any confusion, each file indicates its license at the beginning using the SPDX identifier.

Contributing

Contributions in the form of bug reports, suggestions, patches or pull requests are welcome.