Replies: 2 comments 1 reply
-
You definitely can't dequantize it back to fp8 or fp16, and it'd be pointless anyways because all you'd have is the original Flux dev model. You'd just run the other models anyways. In the settings, FP16 Lora in the top bar is probably what you want to improve not compatibility, but to have the Lora run in a higher precision. It's the precision of the model that causes issues, so training for the model specifically won't change a thing. Less precision is less precision. You don't really need to run NF4. Forge's memory management means you can easily swap out the full sized model in blocks, as long as you set the memory limit well below your maximum GPU memory, it'll just swap out the model. It'll take a bit longer to generate, but you'll have full compatibility with Loras. Still I don't find NF4 + Lora to be unworkable. You just need to prompt right and use the appropriate samplers and CFG. Flux in general is extremely sensitive to sampler type, and step count too. The default sampler for Flux is really only good if you want it to follow text easily. 20 steps is not the maximum, it's the bare minimum. Too many people running Flux at only 20 steps and getting subpar results when 40-60 steps can be better. You aren't going to get everything you want by running low precision and low steps, and it should be of no surprise. |
Beta Was this translation helpful? Give feedback.
-
The bnb-nf4 Unet has changed during quantization. Pairs of dev16 and nf4 images with the same prompt and seed look different. Some are almost identical, but others are radically different. |
Beta Was this translation helpful? Give feedback.
-
Can flux1-dev-bnb-nf4-v2 be dequantized back to dev16 or dev8? There are two reasons for this:
Training on a "dequantized" bnb-nf4 version should improve compatibility. In theory.
Beta Was this translation helpful? Give feedback.
All reactions