Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asking for questions about evaluation #6

Open
mengmeng18 opened this issue Sep 26, 2023 · 10 comments
Open

Asking for questions about evaluation #6

mengmeng18 opened this issue Sep 26, 2023 · 10 comments

Comments

@mengmeng18
Copy link

mengmeng18 commented Sep 26, 2023

Thanks for your great work! There is an issue during testing.
When using python main.py --function test --config configs/cub_stage2.yml --opt "{'test': {'load_token_path': 'ckpts/cub983/tokens/', 'load_unet_path': 'ckpts/cub983/unet/', 'save_log_path': 'ckpts/cub983/log.txt'}}" for evaluation, I found that self.step_store、self. attention_store and self.attention_maps are all empty. Would you please tell me where is wrong?
Looking forward to your reply!

@callsys
Copy link
Owner

callsys commented Sep 26, 2023

The most likely reason is that the register_attention_control function at line 67 of attn.py is not working properly.
In line 115 of attn.py, we replace the get_attention_scores method for all CrossAttention modules in the unet. A different version of diffuser may result in the CrossAttention module no longer containing the get_attention_scores method. The problem you mentioned is that the diffuser version is likely incorrect.

@mengmeng18
Copy link
Author

Thanks a lot! Would you please tell me how to fix this error?

@callsys
Copy link
Owner

callsys commented Sep 26, 2023

1、try pip install --upgrade diffusers[torch]==0.13.1, which is the version we use.
2、Check whether the code runs through the get_attention_scores method at line 71 of attn.py. This method adds attention maps to self.step_store, self.attention_store and self.attention_maps.

@mengmeng18
Copy link
Author

mengmeng18 commented Sep 26, 2023

  1. I have checked that the version is diffusers[torch]==0.13.1.
  2. The code runs AttentionStore.register_attention_control(controller, unet) at the Line 227 of main.py, and then it does runs through the get_attention_scores method at line 71 of attn.py. However, after running these lines, I find that self.step_store, self.attention_store and self.attention_maps are still empty.
    Could you give me some other advices to help me fix this error?

@callsys
Copy link
Owner

callsys commented Sep 26, 2023

At line 106 in attn.py, attention_probs = controller(attention_probs, is_cross, place_in_unet) add the attention_probs into the self.step_store in the controller, you can check if the code goes through this line.

@mengmeng18
Copy link
Author

Thanks a lot! I will check it again.

@KevinLi-167
Copy link

I'm having a similar issue.

I use the CUB dataset and modify the smaller batch_size for the 2-stage training.

train_token I use the default float32.
Since I only have 8g of video memory, I changed train_unet to float16, batch_size=1.
By default, float16 is used for inference.

After 250step training, an error is reported in my inference.
The reason for this is that the attention map like cam part has a value of nan.
Positioned to be the CLIP output, the last 4 of the 6 fr here are all nan.

bug of clip nan2
bug of clip nan3

@callsys
Copy link
Owner

callsys commented Apr 11, 2024

Since CLIP(text encoder) is frozen all the time, it seems that there is a problem with the representative embeddings trained in stage 1. Does the model you trained in stage 1 nan?

Besides, the model requires a large batch size for stage 2 training. If your machine does not have enough memory, using large gradient accumulation can be fine.

@KevinLi-167
Copy link

Thank you for your reply!
I did realize the problem with fr.
(And just found out that the loss is always nan in the log of training unet, so my stage2 may also be completely invalid)
I'm already trying to retrain.
(At stage1, i can't use float16 because loss will show nan .So i still use float32)

I'd like to confirm that fr relies only on the first stage of train_token ,right?
(fr is used for subsequent "training unet" and "inference" as a frozen content)

I have one more question, z0 in the paper is encoded by a VAGAN. But VAE is used in the code.
What is the possible reason for the change of image encoder from VAGAN to VAE, or why it is not the same as the paper?

Thanks again for your reply, I'll read the source code carefully again and try to train.

@callsys
Copy link
Owner

callsys commented Apr 12, 2024

1、fr relies only on the train_token.

2、VQGAN is an improved version of VAE, and they are similar in structure.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants