-
Notifications
You must be signed in to change notification settings - Fork 385
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Example for how to use the grow option? #14
Comments
Right now you have to train a model that has the U-Net stages you are going to train later already in it, this ensures that when you grow the model all of the shapes will be compatible with the smaller model. You do this by putting Note that this doesn't apply to growing by increasing the patch size, that doesn't add U-Net stages so you can do it without deciding what your final patch size is going to be. |
If someone has a good idea on how to reasonably progressively grow models without creating all of the U-Net stages first, in a way that ensures the shapes are compatible, I'd like to hear it :) |
Ah thanks for the swift explanation! Of course now it makes total sense. My naive approach for growing without creating the stages first would be to try to do a "smart copy" of weights from the smaller model into modules of the bigger one that have the same size and relative level and just leave the missing ones with random init values, but I fear that's probably not the way to do it. |
This has probably been already tried, but what are your thoughts on using inpainting to increase the resolution? An example would be something like this (1) generate a 32x32 image (2) crop it into 4 equal squares of size 16x16 (3) for each of those 4 squares double the resolution to 32x32, mask the new unknown pixels and use inpainting to generate them (5) stitch together the 4 images to the 64x64 one. Inpainting is a bit slow, but it would not require additional UNets. |
My gut feeling is that the 32x32 model does not really know about the finer details that should be present in a 64x64 model so I would expect the result to be rather on the blurry pixelly side, except maybe if you train it on a large range of differently scaled details (so it has for example learned how the closeup of an eye looks and not just two black pixels in a face). |
Pictures in the dataset are usually not taken at the same zoom level, I think the model should know how something looks if it's "closer" in order to inpaint it correctly. But yes, you are probably right that if we abuse this process too much (e.g. by repeating it recursively) something weird will happen. For example, zooming on finer details of a face that are not present in any image of the dataset. |
Oh yes that's a possibility of course. I have only just started diving into training my own diffusion models, but one observation with my toy models I made is that the old rule that well-aligned datasets of similar things converge better than those that are zoom-level or composition-wise all over the place still applies. Which is why right now I try to keep my data within a certain theme or scale level (and which is why I try to use the grow method). |
Just FYI - I was curious to see what happens if I map the most likely weights of the 32 model to the 64 model and as it turns out there are actually just 6 weight tensors in the smaller model that have no match in the bigger one - all the others can be copied. And whilst the first sample is just noise, the model training seems to recover quite quickly and at least from what I can see so far does not show any weird behavior. But maybe too early to tell. In case you are interested to look deeper into this option I can share my (super-hacky) code snippets somewhere. |
I'm interested! That should usually work, the thing I am specifically worried about is when you try to do progressive growing where you change the base channel count, say if you have [256, 256, 512] for your 32x32 and then you want to do [128, 256, 256, 512] for your 64x64. Then the first residual block of the 32x32 model has an input channel count of 256 and an output channel count of 256, but in the bigger model the corresponding residual block has an input channel count of 128 (because it now has a 64x64, 128 channel stage before it). So you have to drop that residual block and replace it with a new randomly inited one w/ the correct shapes. This is why I went with creating the entire model beforehand and changing the number of stages you skip, it will create the first residual block of the 32x32 stage with 128 input channels because it knows you're going to add a 128-channel 64x64 stage later. |
Yeah, I guess the question might be if the model learns bad superficial habits that way and rather uses the skip connections whilst neglecting the "deeper" layers that likely have more global knowledge - hard to tell with my small training set and at that size. Here is a csv for the 32 to 64 model mapping The patched part of the code in train.py looks like this:
The two conf files and the args are the ones I used at the top of this thread. |
Somehow I'm not understanding what this means, maybe because I'm still not very familiar with U-Net architecture. Are you saying you set |
From what I have understood, you first have to decide what the maximum size is you want to train for and create that config. So for 128 that would be something like:
Now you make a copy of that conf and for the first stage (assuming you start with 32x32) you change the values in the copy to:
(I don't know if the sigma_max value has to be reduced here to 80?) In the first stage you do not use the grow argument yet. When the 32x32 model has finished training you create another conf for the 64x64 step:
This time you have to use the -grow argument: python3 train.py --config configs/config_64x64_skip.json --name chkpt_64 --grow chkpt_32.pth --grow-config configs/config_32x32_skip.json And once that has finished you can use the 128 conf: python3 train.py --config configs/config_128.json --name chkpt_128 --grow chkpt_64.pth --grow-config configs/config_64x64_skip.json |
Trying to grow to a 256 model I am wondering what the deliberations are when adding channels in order to grow the size. Looking at the progression for 32 to 128 I see: So for 256 I see 3 possible options, but I am not sure if this is just try-and-error or if there are certain pros-and-cons My guess is that using less channels will require less memory, allow for a larger batch size and make training faster at the cost of expressiveness? [Note: I just saw that the UNet2DModel from huggingface/diffusers is using the 128, 128, 256, 256, 512, 512 variant, so I guess i try that one first] [Note 2: looks like 24GB GPU memory are not sufficient to train a 256 model even with a batch size of 1 :-(] |
A little tip for those who are not interested in the evaluation data: the arg --evaluate-every -1 skips that step and saves some time. In addition to that it is possible to skip the whole feature evaluation preparation then (and what's most important, free some GPU memory - which allows for a larger batch size in turn): in train.py:
|
I am trying to use the progressive growth option but I am getting an error when trying to use it as I think it is supposed to be used:
I have a trained 32x32 checkpoint which I am now trying to grow to a 64x64 one, so I am using the following arguments:
python3 train.py --config configs/config_64x64.json --name chkpt_64_1 --batch-size 100 --grow chkpt_32_2.pth --grow-config configs/config_32x32.json
The config_32x32.json is the default one from the repository, the config_64x64.json is using the additional layers and changed values as mentioned in #9:
But when trying to run train.py I am getting a whole lot of "key missing" and "size mismatch" errors in
inner_model.load_state_dict(old_inner_model.state_dict())
Missing key(s) in state_dict: "inner_model.u_net.d_blocks.1.2.main.0.mapper.weight", "inner_model.u_net.d_blocks.1.2.main.0.mapper.bias", "inner_model.u_net.d_blocks.1.2.main.2.weight", "inner_model.u_net.d_blocks.1.2.main.2.bias", "inn....
So I am wondering whether I am doing something wrong here or if this just one of those "work in progress" issues.
Is suspect that I might rather have to do something that involves patch_size and skip_stages since those are used in the wrapper, but I have no idea what their function is.
The text was updated successfully, but these errors were encountered: