Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

Open
2 tasks done
t822876884 opened this issue Mar 5, 2025 · 5 comments
Open
2 tasks done

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

t822876884 opened this issue Mar 5, 2025 · 5 comments
Assignees

Comments

@t822876884
Copy link

t822876884 commented Mar 5, 2025

Prerequisite

Type

I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.

Environment

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0',
'GPU 0': 'NVIDIA H20',
'MMEngine': '0.10.6',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'OpenCV': '4.11.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': '0.7.1',
'numpy_random_seed': 2147483648,
'opencompass': '0.4.1+',
'sys.platform': 'linux',
'transformers': '4.49.0'}

Reproduces the problem - code/configuration sample

eval_deepseek.py

from mmengine.config import read_base

with read_base():
# from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
from opencompass.configs.datasets.aime2024.aime2024_llmverify_repeat8_gen_e8fcee import aime2024_datasets # 8 Run

from opencompass.configs.models.deepseek.vllm_deepseek_r1_distill_qwen_32b_int4 import \
    models as vllm_deepseek_r1_distill_qwen_32b_int4
# from opencompass.configs.models.deepseek.vllm_deepseek_r1_distill_qwen_32b import \
#     models as vllm_deepseek_r1_distill_qwen_32b

# from opencompass.configs.models.deepseek.vllm_deepseek_32b import \
#     models as vllm_deepseek_32b

datasets = aime2024_datasets

vllm_deepseek_r1_distill_qwen_32b_int4.py

from opencompass.models import OpenAISDK

api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)

models = [
dict(
abbr='deepseek-r1-32b-int4',
type=OpenAISDK,
key='EMPTY', # API key
openai_api_base='http://localhost:6006/v1', # 服务地址
path='Deepseek_32_int4', # 请求服务时的 model name
tokenizer_path='Deepseek_32_int4', # 请求服务时的 tokenizer name 或 path, 为None时使用默认tokenizer gpt-4
rpm_verbose=True, # 是否打印请求速率
meta_template=api_meta_template, # 服务请求模板
query_per_second=50, # 服务请求速率
max_out_len=32768, # 最大输出长度
max_seq_len=32768, # 最大输入长度
temperature=0.01, # 生成温度
# tok_p=0.95, # 生成温度
batch_size=8, # 批处理大小
retry=3, # 重试次数
)
]

Reproduces the problem - command or script

nohup python run.py /root/autodl-tmp/opencompass/examples/eval_deepseek.py --hf-num-gpus 2 --max-num-worker 2 --debug > DeepSeek-R1-Distill-Qwen-32B-bnb-4bit.log 2>&1 &

从日志可以看到,评估时间花了很长的时间,从 2025/03/04 22:05:57~2025/03/05 13:34:44

Reproduces the problem - error message

nohup: ignoring input
03/04 22:05:57 - OpenCompass - INFO - Current exp folder: outputs/default/20250304_220557
03/04 22:05:57 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
03/04 22:05:57 - OpenCompass - INFO - Partitioned into 2 tasks.
03/04 22:05:59 - OpenCompass - INFO - Task [deepseek-r1-32b-int4/aime2024-run0_0,deepseek-r1-32b-int4/aime2024-run1_0,deepseek-r1-32b-int4/aime2024-run2_0,deepseek-r1-32b-int4/aime2024-run3_0,deepseek-r1-32b-int4/aime2024-run4_0,deepseek-r1-32b-int4/aime2024-run5_0,deepseek-r1-32b-int4/aime2024-run6_0,deepseek-r1-32b-int4/aime2024-run7_0]
03/04 22:05:59 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/aime.jsonl
03/04 22:05:59 - OpenCompass - INFO - Start inferencing [deepseek-r1-32b-int4/aime2024-run0_0]
03/04 22:05:59 - OpenCompass - WARNING - 'Could not automatically map Deepseek_32_int4 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.', tiktoken encoding cannot load Deepseek_32_int4
03/04 22:06:00 - OpenCompass - WARNING - Can not get tokenizer automatically, will use default tokenizer gpt-4 for length calculation.
[2025-03-04 22:06:04,900] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-03-04 22:06:04,901] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
....
....
....

0%| | 0/2 [00:00<?, ?it/s]

Inferencing: 0%| | 0/8 [00:00<?, ?it/s]�[A03/05 12:29:35 - OpenCompass - INFO - Current RPM 1.
03/05 12:29:35 - OpenCompass - INFO - Current RPM 2.
03/05 12:29:35 - OpenCompass - INFO - Current RPM 3.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 4.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 5.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 6.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 7.
03/05 12:29:40 - OpenCompass - INFO - Current RPM 8.

Inferencing: 12%|█▎ | 1/8 [02:37<18:23, 157.68s/it]�[A

Inferencing: 50%|█████ | 4/8 [32:35<34:22, 515.69s/it]�[A
Inferencing: 100%|██████████| 8/8 [32:35<00:00, 244.42s/it]

50%|█████ | 1/2 [32:35<32:35, 1955.40s/it]

Inferencing: 0%| | 0/6 [00:00<?, ?it/s]�[A03/05 13:02:10 - OpenCompass - INFO - Current RPM 1.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 2.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 3.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 4.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 5.
03/05 13:02:11 - OpenCompass - INFO - Current RPM 6.

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A
Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

100%|██████████| 2/2 [1:05:09<00:00, 1954.89s/it]
100%|██████████| 2/2 [1:05:09<00:00, 1954.96s/it]
03/05 13:34:43 - OpenCompass - INFO - Partitioned into 8 tasks.
03/05 13:34:44 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/aime.jsonl
03/05 13:34:44 - OpenCompass - INFO - Set self.output_path to outputs/default/20250304_220557/results/deepseek-r1-32b-int4/aime2024-run0.json for current task
Traceback (most recent call last):
File "/root/autodl-tmp/opencompass/run.py", line 4, in
main()
File "/root/autodl-tmp/opencompass/opencompass/cli/main.py", line 349, in main
runner(tasks)
File "/root/autodl-tmp/opencompass/opencompass/runners/base.py", line 38, in call
status = self.launch(tasks)
File "/root/autodl-tmp/opencompass/opencompass/runners/local.py", line 136, in launch
task.run()
File "/root/autodl-tmp/opencompass/opencompass/tasks/openicl_eval.py", line 86, in run
self._score()
File "/root/autodl-tmp/opencompass/opencompass/tasks/openicl_eval.py", line 245, in _score
result = icl_evaluator.evaluate(k, n, copy.deepcopy(test_set),
File "/root/autodl-tmp/opencompass/opencompass/openicl/icl_evaluator/icl_base_evaluator.py", line 99, in evaluate
results = self.score(
File "/root/autodl-tmp/opencompass/opencompass/evaluator/generic_llm_evaluator.py", line 83, in score
self.build_inferencer()
File "/root/autodl-tmp/opencompass/opencompass/evaluator/generic_llm_evaluator.py", line 66, in build_inferencer
model = build_model_from_cfg(model_cfg=self.judge_cfg)
File "/root/autodl-tmp/opencompass/opencompass/utils/build.py", line 24, in build_model_from_cfg
return MODELS.build(model_cfg)
File "/root/autodl-tmp/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/autodl-tmp/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 74, in build_from_cfg
raise KeyError(
KeyError: 'cfg or default_args must contain the key "type", but got {}\nNone'

Other information

No response

@t822876884
Copy link
Author

另外,我是通过xinference启动的32B量化模型

@MaiziXiao
Copy link
Collaborator

The configuration you use requires an LLM as the Judger to verify the result.
Follow examples/eval_deepseek_r1.py and also check https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html

@t822876884
Copy link
Author

The configuration you use requires an LLM as the Judger to verify the result. Follow examples/eval_deepseek_r1.py and also check https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html

Okay, I will try again. By the way, how can I speed up the evaluation process?It takes too long for once eval.

@MaiziXiao
Copy link
Collaborator

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A
Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

From the log your provided, it looks to me that each of your question takes very long time (i.e. more than 10 minutes) ,try to request your vLLM server independently to see if it also takes that long time.

@t822876884
Copy link
Author

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

From the log your provided, it looks to me that each of your question takes very long time (i.e. more than 10 minutes) ,try to request your vLLM server independently to see if it also takes that long time.

It's fast to request the vLLM server independently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants