Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

Closed
5663015 opened this issue Apr 9, 2024 · 13 comments
Closed

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

5663015 opened this issue Apr 9, 2024 · 13 comments
Labels

Comments

@5663015
Copy link

5663015 commented Apr 9, 2024

LoRA指令微调,deepspeed设置为zero2,GPU利用率基本在30%~40%左右,已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境:
image

除了利用率低,之前还出现过一个问题:Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住,GPU利用率突然到99%,然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外,其他都一样。设置了output_router_logits=True后正常运行。

@yihaozuifan
Copy link

部署时遇到CUDA extension not installed。并且推理速度特别慢。各位大神如何解决?

@MAxx8371
Copy link

全量finetune,ZeRO3,设置output_router_logits=True。训练过程中会突然卡住,GPU利用率突然到100%
image

@5663015
Copy link
Author

5663015 commented Apr 10, 2024

部署时遇到CUDA extension not installed。并且推理速度特别慢。各位大神如何解决?

可能环境和CUDA版本不匹配,可能显存不够

@5663015
Copy link
Author

5663015 commented Apr 10, 2024

全量finetune,ZeRO3,设置output_router_logits=True。训练过程中会突然卡住,GPU利用率突然到100% image

感觉这版MoE还是有问题啊,我试其他的moe没有问题

@zhanghaobucunzai
Copy link

能否给我一份finetune的数据集jsonl文件

@cooper12121
Copy link

全量finetune,ZeRO3,设置output_router_logits=True。训练过程中会突然卡住,GPU利用率突然到100% image

请问有解决吗

@zhangyu68
Copy link

遇到了类似问题,lora sft
相同配置下,qwen-14b-chat的GPU利用率能达到90+
moe模型的GPU利用率只有40左右

使用的是llama-factory 训练框架,环境信息如下:

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Package                       Version           Editable project location
----------------------------- ----------------- --------------------------------------------------------
absl-py                       2.1.0
accelerate                    0.30.1
aiofiles                      23.2.1
aiohttp                       3.9.3
aiosignal                     1.3.1
altair                        5.3.0
annotated-types               0.6.0
anyio                         4.3.0
async-timeout                 4.0.3
attrs                         23.2.0
blinker                       1.4
Brotli                        1.1.0
build                         1.0.3
certifi                       2024.2.2
cfgv                          3.4.0
charset-normalizer            3.3.2
click                         8.1.7
colored                       2.2.4
coloredlogs                   15.0.1
contourpy                     1.2.1
coverage                      7.4.1
cryptography                  3.4.8
cuda-python                   12.2.0
cycler                        0.12.1
Cython                        3.0.8
datasets                      2.16.1
dbus-python                   1.2.18
diffusers                     0.15.0
dill                          0.3.7
distlib                       0.3.8
distro                        1.7.0
dnspython                     2.6.1
docstring_parser              0.16
einops                        0.7.0
email_validator               2.1.1
evaluate                      0.4.1
exceptiongroup                1.2.0
execnet                       2.0.2
fastapi                       0.111.0
fastapi-cli                   0.0.3
ffmpy                         0.3.2
filelock                      3.13.1
fire                          0.5.0
fonttools                     4.51.0
frozenlist                    1.4.1
fsspec                        2023.10.0
gevent                        23.9.1
geventhttpclient              2.0.2
gradio                        4.31.1
gradio_client                 0.16.3
graphviz                      0.20.1
greenlet                      3.0.3
grpcio                        1.60.1
h11                           0.14.0
httpcore                      1.0.5
httplib2                      0.20.2
httptools                     0.6.1
httpx                         0.27.0
huggingface-hub               0.20.3
humanfriendly                 10.0
identify                      2.5.33
idna                          3.6
importlib-metadata            4.6.4
importlib_resources           6.4.0
iniconfig                     2.0.0
jeepney                       0.7.1
jieba                         0.42.1
Jinja2                        3.1.3
joblib                        1.3.2
jsonschema                    4.22.0
jsonschema-specifications     2023.12.1
keyring                       23.5.0
kiwisolver                    1.4.5
lark                          1.1.9
launchpadlib                  1.10.16
lazr.restfulclient            0.14.4
lazr.uri                      1.0.6
llmtuner                      0.7.1.dev0    
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.8.4
mdurl                         0.1.2
more-itertools                8.10.0
mpi4py                        3.1.5
mpmath                        1.3.0
multidict                     6.0.5
multiprocess                  0.70.15
mypy                          1.8.0
mypy-extensions               1.0.0
networkx                      3.2.1
nltk                          3.8.1
nodeenv                       1.8.0
numpy                         1.26.1
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.3.101
nvidia-nvtx-cu12              12.1.105
oauthlib                      3.2.0
onnx                          1.15.0
optimum                       1.16.2
orjson                        3.10.3
packaging                     23.2
pandas                        2.2.0
parameterized                 0.9.0
peft                          0.10.0
pillow                        10.2.0
pip                           24.0
platformdirs                  4.2.0
pluggy                        1.4.0
polygraphy                    0.48.1
pre-commit                    3.6.0
protobuf                      4.25.2
psutil                        5.9.8
py                            1.11.0
pyarrow                       15.0.0
pyarrow-hotfix                0.6
pybind11-stubgen              2.4.2
pydantic                      2.7.1
pydantic_core                 2.18.2
pydub                         0.25.1
Pygments                      2.18.0
PyGObject                     3.42.1
PyJWT                         2.3.0
pynvml                        11.5.0
pyparsing                     2.4.7
pyproject_hooks               1.0.0
pytest                        8.0.0
pytest-cov                    4.1.0
pytest-forked                 1.6.0
pytest-xdist                  3.5.0
python-apt                    2.4.0+ubuntu2
python-dateutil               2.8.2
python-dotenv                 1.0.1
python-multipart              0.0.9
python-rapidjson              1.14
pytz                          2024.1
PyYAML                        6.0.1
referencing                   0.35.1
regex                         2023.12.25
requests                      2.31.0
responses                     0.18.0
rich                          13.7.1
rouge-chinese                 1.0.3
rouge-score                   0.1.2
rpds-py                       0.18.1
ruff                          0.4.4
safetensors                   0.4.2
scipy                         1.13.0
SecretStorage                 3.3.1
semantic-version              2.10.0
sentencepiece                 0.1.99
setuptools                    68.2.2
shellingham                   1.5.4
shtab                         1.7.1
six                           1.16.0
sniffio                       1.3.1
sse-starlette                 2.1.0
starlette                     0.37.2
sympy                         1.12
tabulate                      0.9.0
tensorrt                      9.2.0.post12.dev5
tensorrt-llm                  0.7.1
termcolor                     2.4.0
tiktoken                      0.7.0
tokenizers                    0.19.1
tomli                         2.0.1
tomlkit                       0.12.0
toolz                         0.12.1
torch                         2.1.0
tqdm                          4.66.1
transformers                  4.40.2
transformers-stream-generator 0.0.5
triton                        2.1.0
tritonclient                  2.42.0
trl                           0.8.6
typer                         0.12.3
typing_extensions             4.8.0
tyro                          0.8.4
tzdata                        2023.4
ujson                         5.10.0
urllib3                       2.2.0
uvicorn                       0.29.0
uvloop                        0.19.0
virtualenv                    20.25.0
wadllib                       1.3.6
watchfiles                    0.21.0
websockets                    11.0.3
wheel                         0.41.2
xxhash                        3.4.1
yarl                          1.9.4
zipp                          1.0.0
zope.event                    5.0
zope.interface                6.1

Copy link

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2024
@wenjie-yuan
Copy link

LoRA指令微调,deepspeed设置为zero2,GPU利用率基本在30%~40%左右,已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境: image

除了利用率低,之前还出现过一个问题:Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住,GPU利用率突然到99%,然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外,其他都一样。设置了output_router_logits=True后正常运行。

你好,你是用的多少卡/显存run起来的?

@5663015
Copy link
Author

5663015 commented Aug 8, 2024

LoRA指令微调,deepspeed设置为zero2,GPU利用率基本在30%~40%左右,已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。
运行环境: image
除了利用率低,之前还出现过一个问题:Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住,GPU利用率突然到99%,然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外,其他都一样。设置了output_router_logits=True后正常运行。

你好,你是用的多少卡/显存run起来的?

在单卡80G显存上跑的

@ToruKiyono
Copy link

ToruKiyono commented Aug 27, 2024

关于这个,我这发现有一种情况会出现这现象。
在多卡推理的时候,如果推理过程中有用到随机数处理logits,那么这时候可能会有多卡之中存在某一些卡出现不一样的logits结果,甚至某些卡会提前出现结束符,从而结束了这一batch的推理,但是其他卡还在进行这个batch的推理,所以会导致突然卡住。

设置了统一的随机数种子就好了。
torch.cuda.manual_seed_all(42)

@cdxzyc
Copy link

cdxzyc commented Oct 10, 2024

LoRA指令微调,deepspeed设置为zero2,GPU利用率基本在30%~40%左右,已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境: image

除了利用率低,之前还出现过一个问题:Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住,GPU利用率突然到99%,然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外,其他都一样。设置了output_router_logits=True后正常运行。

请问有解决嘛

Copy link

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 23, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

9 participants