Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

run_train.sh使用mintood方法时出现nan问题 #1

Open
yuxinokk opened this issue Jan 18, 2025 · 4 comments
Open

run_train.sh使用mintood方法时出现nan问题 #1

yuxinokk opened this issue Jan 18, 2025 · 4 comments

Comments

@yuxinokk
Copy link

作者run_train.sh是用的sdif方法,但run_test.sh里使用mintood方法,需要mintood方法运行的run_train.sh,但将run_train.sh的method换为mintood发现报错nan,求解答

@zhougr18
Copy link
Collaborator

您好!感谢您对我们工作的关注,我们用仓库代码测试mintood方法后并未发现nan的问题,可能是环境不同导致的计算问题,您可以再确认一下,如果还存在问题可以提供更具体的报错信息,便于我们帮助您分析问题。

@yuxinokk
Copy link
Author

yuxinokk commented Feb 1, 2025

作者您好,我遇到的报错信息具体是这样的,epoch跑出来loss是nan,如果继续往下会报错显示第二张图,麻烦您帮我看一下Image

Image

@yuxinokk
Copy link
Author

yuxinokk commented Feb 2, 2025 via email

@zhougr18
Copy link
Collaborator

zhougr18 commented Feb 8, 2025

Package Version


addict 2.4.0
apex 0.1
appdirs 1.4.4
audioread 3.0.0
av 10.0.0
blessed 1.20.0
brotlipy 0.7.0
certifi 2022.12.7
cffi 1.15.1
charset-normalizer 2.0.4
click 8.1.3
colorama 0.4.6
contourpy 1.0.7
cryptography 38.0.4
cycler 0.11.0
Cython 0.29.33
decorator 5.1.1
easydict 1.10
einops 0.6.0
filelock 3.9.0
flit_core 3.6.0
fonttools 4.38.0
fvcore 0.1.5.post20221221
gpustat 1.0.0
huggingface-hub 0.12.0
idna 3.4
importlib-metadata 6.0.0
iopath 0.1.9
joblib 1.2.0
kiwisolver 1.4.4
librosa 0.9.2
llvmlite 0.39.1
Markdown 3.4.1
markdown-it-py 2.1.0
matplotlib 3.6.3
mdurl 0.1.2
mkl-fft 1.3.1
mkl-random 1.2.2
mkl-service 2.4.0
mmcv-full 1.7.1
mmdet 2.28.1
model-index 0.1.11
munkres 1.1.4
natsort 8.3.1
numba 0.56.4
numpy 1.23.5
nvidia-ml-py 11.495.46
opencv-python 4.7.0.68
openmim 0.3.6
ordered-set 4.1.0
packaging 23.0
pandas 1.5.3
Pillow 9.3.0
pip 22.3.1
pooch 1.6.0
portalocker 2.7.0
psutil 5.9.4
pycocotools 2.0.6
pycparser 2.21
Pygments 2.14.0
pyOpenSSL 22.0.0
pyparsing 3.0.9
PySocks 1.7.1
python-dateutil 2.8.2
pytz 2022.7.1
PyYAML 6.0
regex 2022.10.31
requests 2.28.1
resampy 0.4.2
rich 13.3.1
scikit-learn 1.2.1
scipy 1.10.0
setuptools 65.6.3
six 1.16.0
soundfile 0.11.0
tabulate 0.9.0
termcolor 2.2.0
terminaltables 3.1.10
threadpoolctl 3.1.0
timm 0.1.20
tokenizers 0.13.2
torch 1.13.1
torchaudio 0.13.1
torchvision 0.14.1
tqdm 4.65.0
transformers 4.26.1
typing_extensions 4.4.0
urllib3 1.26.14
wcwidth 0.2.6
wheel 0.37.1
yacs 0.1.8
yapf 0.32.0
zipp 3.13.0
我们再次确认了当前代码在我们的环境下并不会出现nan的问题,这里是我们的环境配置,请检查重要的package(torch,transformers等)的版本是否一致,根据之前的经验最新版本的transformers可能会导致nan的问题。如果要进一步查找问题,可以借助调试工具定位模型隐藏层特征中第一次出现nan的位置

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants