Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

image transforms #115

Merged
merged 2 commits into from
Jan 7, 2023
Merged

image transforms #115

merged 2 commits into from
Jan 7, 2023

Conversation

ethansmith2000
Copy link
Contributor

making a fork for my own usage, but thought i'd give a pr as well. thank you for your work!

@brian6091
Copy link
Collaborator

brian6091 commented Jan 6, 2023

Great, looks a lot cleaner! One issue though (which applies in general, not just to your PR) is that if both resize and center_crop are false, you will potentially give the VAE images at different resolutions. Perhaps useful to include a check to force a resize at the end of the pipeline before converting to a tensor? This is useful if you feed the training script images with varying sizes.

I've done this here:
https://github.com/brian6091/Dreambooth/blob/mix/src/datasets.py#L112-132

@ethansmith2000
Copy link
Contributor Author

good point! also noticed there isn't an option for caching latents, I'd be happy to make a pr for that as well if that's something that would be useful

@brian6091
Copy link
Collaborator

brian6091 commented Jan 6, 2023

There is an issue for caching latents: #62

One point is that if you are relying on transformations to augment your data, like color jitter, caching latents won't help you since you need to run the transformations on each batch.

Maybe you could chime in on the discussion to see what others think. My general feeling is to keep the training scripts in the LoRA repo as simple as possible so that others can easily see what they need to do to adapt their own scripts. But if caching latents saves enough memory to let more people train with lower GPU requirements, then it may be worth it.

@ethansmith2000
Copy link
Contributor Author

ah didn't think of that. good reason to have it optional.
there probably isn't a reliable way to jitter colors of the latents, but also maybe could do the linear decode trick, apply, and turn it back

@brian6091
Copy link
Collaborator

brian6091 commented Jan 6, 2023

Or you could augment in latent space, but I'm not sure anyone knows what that means!

@cloneofsimo cloneofsimo changed the base branch from master to develop January 7, 2023 02:20
@cloneofsimo cloneofsimo merged commit 04089c8 into cloneofsimo:develop Jan 7, 2023
@cloneofsimo
Copy link
Owner

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants