A repository to consolidate stable diffusion finetuning scripts in to a training hub. Train inpainting, depth, v1+, v2+, image variations, image colorization, whatever. Train with optimizations like 8-bit adam and xformers for faster and more memory efficient training.
- Train depth
- Train inpaint
- Train on custom image input (image latent concat to noise latent) *idea from Justin Pinkey
- Train on custom conditionings (image embeddings instead of text for example) *idea from Justin Pinkey
- Use filenames as prompts
- Use bnb 8-bit adam for more memory efficient training
- Use xformers for more memory efficient training
- Mixed precision (fp16/bf16) training
- Prompt shuffling (split on ',' or a custom string and shuffle based on a given probability)
- Train depth with custom depth images
Pull requests, discussions, requests, suggestions, and critiques are all welcome! :)
This is a combination of a bunch of repos as well as my own code and edits on scripts. I will do my best to give credit where credit is due in the form of comments, licenses, a shout-out on the readme, etc. If I happen to miss giving anyone credit/include a license please email me at labounty3d@gmail.com and I will fix it!
- Huge thanks to Hugging Face for the diffusers library that makes most of this code possible
- Huge thanks to Stable Diffusion for creating the actual diffusion model and open sourcing it
- Thanks to epitaque for depth training
- Another thanks to Hugging Face for inpainting training
- Shoutout to EveryDream for windows venv setup and bnb patch
- Shoutout to Justin Pinkey/Lambda Labs for research in to training with different inputs
Reach out to labounty3d@gmail.com with any requests/questions/comments/suggestions/concerns
If you're interested in training text-to-audio latent diffusion go check out https://github.com/serp-co/ai-text-to-audio-latent-diffusion