Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

5663015 · 2024-04-09T04:06:01Z

LoRA指令微调，deepspeed设置为zero2，GPU利用率基本在30%~40%左右，已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境：

除了利用率低，之前还出现过一个问题：Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住，GPU利用率突然到99%，然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外，其他都一样。设置了output_router_logits=True后正常运行。

The text was updated successfully, but these errors were encountered:

yihaozuifan · 2024-04-09T07:53:19Z

部署时遇到CUDA extension not installed。并且推理速度特别慢。各位大神如何解决？

MAxx8371 · 2024-04-10T06:00:13Z

全量finetune，ZeRO3，设置output_router_logits=True。训练过程中会突然卡住，GPU利用率突然到100%

5663015 · 2024-04-10T08:16:42Z

部署时遇到CUDA extension not installed。并且推理速度特别慢。各位大神如何解决？

可能环境和CUDA版本不匹配，可能显存不够

5663015 · 2024-04-10T08:18:38Z

全量finetune，ZeRO3，设置output_router_logits=True。训练过程中会突然卡住，GPU利用率突然到100%

感觉这版MoE还是有问题啊，我试其他的moe没有问题

zhanghaobucunzai · 2024-04-10T09:05:06Z

能否给我一份finetune的数据集jsonl文件

cooper12121 · 2024-04-17T13:30:35Z

全量finetune，ZeRO3，设置output_router_logits=True。训练过程中会突然卡住，GPU利用率突然到100%

请问有解决吗

zhangyu68 · 2024-05-15T02:11:46Z

遇到了类似问题，lora sft
相同配置下，qwen-14b-chat的GPU利用率能达到90+
moe模型的GPU利用率只有40左右

使用的是llama-factory 训练框架，环境信息如下：

nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Package                       Version           Editable project location
----------------------------- ----------------- --------------------------------------------------------
absl-py                       2.1.0
accelerate                    0.30.1
aiofiles                      23.2.1
aiohttp                       3.9.3
aiosignal                     1.3.1
altair                        5.3.0
annotated-types               0.6.0
anyio                         4.3.0
async-timeout                 4.0.3
attrs                         23.2.0
blinker                       1.4
Brotli                        1.1.0
build                         1.0.3
certifi                       2024.2.2
cfgv                          3.4.0
charset-normalizer            3.3.2
click                         8.1.7
colored                       2.2.4
coloredlogs                   15.0.1
contourpy                     1.2.1
coverage                      7.4.1
cryptography                  3.4.8
cuda-python                   12.2.0
cycler                        0.12.1
Cython                        3.0.8
datasets                      2.16.1
dbus-python                   1.2.18
diffusers                     0.15.0
dill                          0.3.7
distlib                       0.3.8
distro                        1.7.0
dnspython                     2.6.1
docstring_parser              0.16
einops                        0.7.0
email_validator               2.1.1
evaluate                      0.4.1
exceptiongroup                1.2.0
execnet                       2.0.2
fastapi                       0.111.0
fastapi-cli                   0.0.3
ffmpy                         0.3.2
filelock                      3.13.1
fire                          0.5.0
fonttools                     4.51.0
frozenlist                    1.4.1
fsspec                        2023.10.0
gevent                        23.9.1
geventhttpclient              2.0.2
gradio                        4.31.1
gradio_client                 0.16.3
graphviz                      0.20.1
greenlet                      3.0.3
grpcio                        1.60.1
h11                           0.14.0
httpcore                      1.0.5
httplib2                      0.20.2
httptools                     0.6.1
httpx                         0.27.0
huggingface-hub               0.20.3
humanfriendly                 10.0
identify                      2.5.33
idna                          3.6
importlib-metadata            4.6.4
importlib_resources           6.4.0
iniconfig                     2.0.0
jeepney                       0.7.1
jieba                         0.42.1
Jinja2                        3.1.3
joblib                        1.3.2
jsonschema                    4.22.0
jsonschema-specifications     2023.12.1
keyring                       23.5.0
kiwisolver                    1.4.5
lark                          1.1.9
launchpadlib                  1.10.16
lazr.restfulclient            0.14.4
lazr.uri                      1.0.6
llmtuner                      0.7.1.dev0    
markdown-it-py                3.0.0
MarkupSafe                    2.1.5
matplotlib                    3.8.4
mdurl                         0.1.2
more-itertools                8.10.0
mpi4py                        3.1.5
mpmath                        1.3.0
multidict                     6.0.5
multiprocess                  0.70.15
mypy                          1.8.0
mypy-extensions               1.0.0
networkx                      3.2.1
nltk                          3.8.1
nodeenv                       1.8.0
numpy                         1.26.1
nvidia-cublas-cu12            12.1.3.1
nvidia-cuda-cupti-cu12        12.1.105
nvidia-cuda-nvrtc-cu12        12.1.105
nvidia-cuda-runtime-cu12      12.1.105
nvidia-cudnn-cu12             8.9.2.26
nvidia-cufft-cu12             11.0.2.54
nvidia-curand-cu12            10.3.2.106
nvidia-cusolver-cu12          11.4.5.107
nvidia-cusparse-cu12          12.1.0.106
nvidia-nccl-cu12              2.18.1
nvidia-nvjitlink-cu12         12.3.101
nvidia-nvtx-cu12              12.1.105
oauthlib                      3.2.0
onnx                          1.15.0
optimum                       1.16.2
orjson                        3.10.3
packaging                     23.2
pandas                        2.2.0
parameterized                 0.9.0
peft                          0.10.0
pillow                        10.2.0
pip                           24.0
platformdirs                  4.2.0
pluggy                        1.4.0
polygraphy                    0.48.1
pre-commit                    3.6.0
protobuf                      4.25.2
psutil                        5.9.8
py                            1.11.0
pyarrow                       15.0.0
pyarrow-hotfix                0.6
pybind11-stubgen              2.4.2
pydantic                      2.7.1
pydantic_core                 2.18.2
pydub                         0.25.1
Pygments                      2.18.0
PyGObject                     3.42.1
PyJWT                         2.3.0
pynvml                        11.5.0
pyparsing                     2.4.7
pyproject_hooks               1.0.0
pytest                        8.0.0
pytest-cov                    4.1.0
pytest-forked                 1.6.0
pytest-xdist                  3.5.0
python-apt                    2.4.0+ubuntu2
python-dateutil               2.8.2
python-dotenv                 1.0.1
python-multipart              0.0.9
python-rapidjson              1.14
pytz                          2024.1
PyYAML                        6.0.1
referencing                   0.35.1
regex                         2023.12.25
requests                      2.31.0
responses                     0.18.0
rich                          13.7.1
rouge-chinese                 1.0.3
rouge-score                   0.1.2
rpds-py                       0.18.1
ruff                          0.4.4
safetensors                   0.4.2
scipy                         1.13.0
SecretStorage                 3.3.1
semantic-version              2.10.0
sentencepiece                 0.1.99
setuptools                    68.2.2
shellingham                   1.5.4
shtab                         1.7.1
six                           1.16.0
sniffio                       1.3.1
sse-starlette                 2.1.0
starlette                     0.37.2
sympy                         1.12
tabulate                      0.9.0
tensorrt                      9.2.0.post12.dev5
tensorrt-llm                  0.7.1
termcolor                     2.4.0
tiktoken                      0.7.0
tokenizers                    0.19.1
tomli                         2.0.1
tomlkit                       0.12.0
toolz                         0.12.1
torch                         2.1.0
tqdm                          4.66.1
transformers                  4.40.2
transformers-stream-generator 0.0.5
triton                        2.1.0
tritonclient                  2.42.0
trl                           0.8.6
typer                         0.12.3
typing_extensions             4.8.0
tyro                          0.8.4
tzdata                        2023.4
ujson                         5.10.0
urllib3                       2.2.0
uvicorn                       0.29.0
uvloop                        0.19.0
virtualenv                    20.25.0
wadllib                       1.3.6
watchfiles                    0.21.0
websockets                    11.0.3
wheel                         0.41.2
xxhash                        3.4.1
yarl                          1.9.4
zipp                          1.0.0
zope.event                    5.0
zope.interface                6.1

github-actions · 2024-06-21T08:01:20Z

This issue has been automatically marked as inactive due to lack of recent activity. Should you believe it remains unresolved and warrants attention, kindly leave a comment on this thread.

wenjie-yuan · 2024-08-07T16:23:28Z

LoRA指令微调，deepspeed设置为zero2，GPU利用率基本在30%~40%左右，已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境：

除了利用率低，之前还出现过一个问题：Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住，GPU利用率突然到99%，然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外，其他都一样。设置了output_router_logits=True后正常运行。

你好，你是用的多少卡/显存run起来的？

5663015 · 2024-08-08T09:23:55Z

LoRA指令微调，deepspeed设置为zero2，GPU利用率基本在30%~40%左右，已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。
运行环境：
除了利用率低，之前还出现过一个问题：Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住，GPU利用率突然到99%，然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外，其他都一样。设置了output_router_logits=True后正常运行。

你好，你是用的多少卡/显存run起来的？

在单卡80G显存上跑的

ToruKiyono · 2024-08-27T08:53:29Z

关于这个，我这发现有一种情况会出现这现象。
在多卡推理的时候，如果推理过程中有用到随机数处理logits，那么这时候可能会有多卡之中存在某一些卡出现不一样的logits结果，甚至某些卡会提前出现结束符，从而结束了这一batch的推理，但是其他卡还在进行这个batch的推理，所以会导致突然卡住。

设置了统一的随机数种子就好了。
torch.cuda.manual_seed_all(42)

cdxzyc · 2024-10-10T03:49:03Z

LoRA指令微调，deepspeed设置为zero2，GPU利用率基本在30%~40%左右，已在AutoConfig里设置了output_router_logits=True。非MoE模型正常。

运行环境：

除了利用率低，之前还出现过一个问题：Qwen1.5-MoE-A2.7B-Chat训练到80多steps时卡住，GPU利用率突然到99%，然后就一直保持这个状态。运行环境除了output_router_logits=True没有设置外，其他都一样。设置了output_router_logits=True后正常运行。

请问有解决嘛

github-actions · 2025-02-23T08:02:09Z

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

Jacky-hate mentioned this issue Apr 16, 2024

Enable compatibility of sparse MOE training under DeepSpeed ZeRO3 #312

Closed

github-actions bot added the inactive label Jun 21, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 29, 2024

github-actions bot locked as resolved and limited conversation to collaborators Feb 23, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

5663015 commented Apr 9, 2024

yihaozuifan commented Apr 9, 2024

MAxx8371 commented Apr 10, 2024

5663015 commented Apr 10, 2024

5663015 commented Apr 10, 2024

zhanghaobucunzai commented Apr 10, 2024

cooper12121 commented Apr 17, 2024

zhangyu68 commented May 15, 2024

github-actions bot commented Jun 21, 2024

wenjie-yuan commented Aug 7, 2024

5663015 commented Aug 8, 2024

ToruKiyono commented Aug 27, 2024 •

edited

Loading

cdxzyc commented Oct 10, 2024

github-actions bot commented Feb 23, 2025

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

Qwen1.5-MoE-A2.7B-Chat微调GPU利用率很低 #275

Comments

5663015 commented Apr 9, 2024

yihaozuifan commented Apr 9, 2024

MAxx8371 commented Apr 10, 2024

5663015 commented Apr 10, 2024

5663015 commented Apr 10, 2024

zhanghaobucunzai commented Apr 10, 2024

cooper12121 commented Apr 17, 2024

zhangyu68 commented May 15, 2024

github-actions bot commented Jun 21, 2024

wenjie-yuan commented Aug 7, 2024

5663015 commented Aug 8, 2024

ToruKiyono commented Aug 27, 2024 • edited Loading

cdxzyc commented Oct 10, 2024

github-actions bot commented Feb 23, 2025

ToruKiyono commented Aug 27, 2024 •

edited

Loading