Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: probability tensor contains either inf, nan or element < 0 #9

Open
biscuit1103 opened this issue May 24, 2024 · 7 comments

Comments

@biscuit1103
Copy link

Thanks for your great work! during the replication process, I encountered a stumbling block while Eval. I encountered this problem, how should I solve it?
55%|█████▌ | 11/20 [4:04:49<3:20:18, 1335.42s/it]
Traceback (most recent call last):
File "GPT_eval_multi.py", line 115, in
best_fid, best_iter, best_div, best_top1, best_top2, best_top3, best_matching, best_multi, writer, logger = eval_trans.evaluation_transformer(args.out_dir, val_loader, net, trans_encoder, logger, writer, 0, best_fid=1000, best_iter=0, best_div=100, best_top1=0, best_top2=0, best_top3=0, best_matching=100, clip_model=clip_model, eval_wrapper=eval_wrapper, dataname=args.dataname, save = False, num_repeat=11, rand_pos=True)
File "/data1/anaconda3/envs/T2M/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
return func(*args, **kwargs)
File "/data2/MMM-main/utils/eval_trans.py", line 217, in evaluation_transformer
index_motion = trans(feat_clip_text, word_emb, type="sample", m_length=pred_len, rand_pos=rand_pos, CFG=CFG)
File "/data1/anaconda3/envs/T2M/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/data2/MMM-main/models/t2m_trans.py", line 199, in forward
return self.sample(*args, **kwargs)
File "/data2/MMM-main/models/t2m_trans.py", line 263, in sample
sorted_score_indices = scores.multinomial(scores.shape[-1], replacement=False) # stocastic
RuntimeError: probability tensor contains either inf, nan or element < 0

@exitudio
Copy link
Owner

Hi. Thank you for your interest.
Are you evaluating on our pretrain model?
Does the problem occur every time?

@biscuit1103
Copy link
Author

Hi. Thank you for your interest. Are you evaluating on our pretrain model? Does the problem occur every time?
thanks for your reply .yes , when run the evaluation command on given pre-trained model after 6 iters, it will occur . i used torch 1.8.1+cu111

@exitudio
Copy link
Owner

exitudio commented Jun 4, 2024

I found a potential problem in this line. I already commit the change. Along with the updated pretrain model.

Can you try to

  1. Download the new pretrain model (Section 2.3 in read me)
bash dataset/prepare/download_model.sh
  1. Run eval (Eval section in read me)
python GPT_eval_multi.py --exp-name eval_name --resume-pth output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32/net_last.pth --resume-trans output/t2m/2023-10-10-03-17-01_HML3D_44_crsAtt2lyr_mask0.5-1/net_last.pth --num-local-layer 2

--num-local-layer 2 I use 2 local layers instead of 1 in the previous model

If you still see the problem, please let me know. Thank you.

@biscuit1103
Copy link
Author

I found a potential problem in this line. I already commit the change. Along with the updated pretrain model.

Can you try to

  1. Download the new pretrain model (Section 2.3 in read me)
bash dataset/prepare/download_model.sh
  1. Run eval (Eval section in read me)
python GPT_eval_multi.py --exp-name eval_name --resume-pth output/vq/2023-07-19-04-17-17_12_VQVAE_20batchResetNRandom_8192_32/net_last.pth --resume-trans output/t2m/2023-10-10-03-17-01_HML3D_44_crsAtt2lyr_mask0.5-1/net_last.pth --num-local-layer 2

--num-local-layer 2 I use 2 local layers instead of 1 in the previous model

If you still see the problem, please let me know. Thank you.

Thank you very much! I have solved this problem, but the evaluation results do not seem to reach the original results. I don't know which part caused the error.
image

@exitudio
Copy link
Owner

exitudio commented Jun 7, 2024

This issue should come from the randomness from this line. I just commented it out in the previous commit. Can you double check that it is updated?

@biscuit1103
Copy link
Author

This issue should come from the randomness from this line. I just commented it out in the previous commit. Can you double check that it is updated?

Thank you for your answer. This part of the code has been updated. After I modified it, I got an evaluation result of FID 0.088. If I use (predict length) to get 0.080 data, what parameters or pre-trained models should I replace?

@exitudio
Copy link
Owner

We use pretrain length estimator from Text-to-motion (the model that proposed along with HumanML3D paper) here

This can be plugged into our existing model. I will let you know after adding this code to the repo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants