[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

t822876884 · 2025-03-05T06:17:18Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I have modified the code (config is not considered code), or I'm working on my own tasks/models/datasets.

Environment

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0',
'GPU 0': 'NVIDIA H20',
'MMEngine': '0.10.6',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.8, V11.8.89',
'OpenCV': '4.11.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': '0.7.1',
'numpy_random_seed': 2147483648,
'opencompass': '0.4.1+',
'sys.platform': 'linux',
'transformers': '4.49.0'}

Reproduces the problem - code/configuration sample

eval_deepseek.py

from mmengine.config import read_base

with read_base():
# from opencompass.configs.datasets.ceval.ceval_gen import ceval_datasets
from opencompass.configs.datasets.aime2024.aime2024_llmverify_repeat8_gen_e8fcee import aime2024_datasets # 8 Run

from opencompass.configs.models.deepseek.vllm_deepseek_r1_distill_qwen_32b_int4 import \
    models as vllm_deepseek_r1_distill_qwen_32b_int4
# from opencompass.configs.models.deepseek.vllm_deepseek_r1_distill_qwen_32b import \
#     models as vllm_deepseek_r1_distill_qwen_32b

# from opencompass.configs.models.deepseek.vllm_deepseek_32b import \
#     models as vllm_deepseek_32b

datasets = aime2024_datasets

vllm_deepseek_r1_distill_qwen_32b_int4.py

from opencompass.models import OpenAISDK

api_meta_template = dict(
round=[
dict(role='HUMAN', api_role='HUMAN'),
dict(role='BOT', api_role='BOT', generate=True),
],
reserved_roles=[dict(role='SYSTEM', api_role='SYSTEM')],
)

models = [
dict(
abbr='deepseek-r1-32b-int4',
type=OpenAISDK,
key='EMPTY', # API key
openai_api_base='http://localhost:6006/v1', # 服务地址
path='Deepseek_32_int4', # 请求服务时的 model name
tokenizer_path='Deepseek_32_int4', # 请求服务时的 tokenizer name 或 path, 为None时使用默认tokenizer gpt-4
rpm_verbose=True, # 是否打印请求速率
meta_template=api_meta_template, # 服务请求模板
query_per_second=50, # 服务请求速率
max_out_len=32768, # 最大输出长度
max_seq_len=32768, # 最大输入长度
temperature=0.01, # 生成温度
# tok_p=0.95, # 生成温度
batch_size=8, # 批处理大小
retry=3, # 重试次数
)
]

Reproduces the problem - command or script

nohup python run.py /root/autodl-tmp/opencompass/examples/eval_deepseek.py --hf-num-gpus 2 --max-num-worker 2 --debug > DeepSeek-R1-Distill-Qwen-32B-bnb-4bit.log 2>&1 &

从日志可以看到，评估时间花了很长的时间，从 2025/03/04 22:05:57～2025/03/05 13:34:44

Reproduces the problem - error message

nohup: ignoring input
03/04 22:05:57 - OpenCompass - INFO - Current exp folder: outputs/default/20250304_220557
03/04 22:05:57 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
03/04 22:05:57 - OpenCompass - INFO - Partitioned into 2 tasks.
03/04 22:05:59 - OpenCompass - INFO - Task [deepseek-r1-32b-int4/aime2024-run0_0,deepseek-r1-32b-int4/aime2024-run1_0,deepseek-r1-32b-int4/aime2024-run2_0,deepseek-r1-32b-int4/aime2024-run3_0,deepseek-r1-32b-int4/aime2024-run4_0,deepseek-r1-32b-int4/aime2024-run5_0,deepseek-r1-32b-int4/aime2024-run6_0,deepseek-r1-32b-int4/aime2024-run7_0]
03/04 22:05:59 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/aime.jsonl
03/04 22:05:59 - OpenCompass - INFO - Start inferencing [deepseek-r1-32b-int4/aime2024-run0_0]
03/04 22:05:59 - OpenCompass - WARNING - 'Could not automatically map Deepseek_32_int4 to a tokeniser. Please use tiktoken.get_encoding to explicitly get the tokeniser you expect.', tiktoken encoding cannot load Deepseek_32_int4
03/04 22:06:00 - OpenCompass - WARNING - Can not get tokenizer automatically, will use default tokenizer gpt-4 for length calculation.
[2025-03-04 22:06:04,900] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-03-04 22:06:04,901] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
....
....
....

0%| | 0/2 [00:00<?, ?it/s]

Inferencing: 0%| | 0/8 [00:00<?, ?it/s]�[A03/05 12:29:35 - OpenCompass - INFO - Current RPM 1.
03/05 12:29:35 - OpenCompass - INFO - Current RPM 2.
03/05 12:29:35 - OpenCompass - INFO - Current RPM 3.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 4.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 5.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 6.
03/05 12:29:36 - OpenCompass - INFO - Current RPM 7.
03/05 12:29:40 - OpenCompass - INFO - Current RPM 8.

Inferencing: 12%|█▎ | 1/8 [02:37<18:23, 157.68s/it]�[A

Inferencing: 50%|█████ | 4/8 [32:35<34:22, 515.69s/it]�[A
Inferencing: 100%|██████████| 8/8 [32:35<00:00, 244.42s/it]

50%|█████ | 1/2 [32:35<32:35, 1955.40s/it]

Inferencing: 0%| | 0/6 [00:00<?, ?it/s]�[A03/05 13:02:10 - OpenCompass - INFO - Current RPM 1.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 2.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 3.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 4.
03/05 13:02:10 - OpenCompass - INFO - Current RPM 5.
03/05 13:02:11 - OpenCompass - INFO - Current RPM 6.

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A
Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

100%|██████████| 2/2 [1:05:09<00:00, 1954.89s/it]
100%|██████████| 2/2 [1:05:09<00:00, 1954.96s/it]
03/05 13:34:43 - OpenCompass - INFO - Partitioned into 8 tasks.
03/05 13:34:44 - OpenCompass - INFO - Try to load the data from /root/.cache/opencompass/./data/aime.jsonl
03/05 13:34:44 - OpenCompass - INFO - Set self.output_path to outputs/default/20250304_220557/results/deepseek-r1-32b-int4/aime2024-run0.json for current task
Traceback (most recent call last):
File "/root/autodl-tmp/opencompass/run.py", line 4, in
main()
File "/root/autodl-tmp/opencompass/opencompass/cli/main.py", line 349, in main
runner(tasks)
File "/root/autodl-tmp/opencompass/opencompass/runners/base.py", line 38, in call
status = self.launch(tasks)
File "/root/autodl-tmp/opencompass/opencompass/runners/local.py", line 136, in launch
task.run()
File "/root/autodl-tmp/opencompass/opencompass/tasks/openicl_eval.py", line 86, in run
self._score()
File "/root/autodl-tmp/opencompass/opencompass/tasks/openicl_eval.py", line 245, in _score
result = icl_evaluator.evaluate(k, n, copy.deepcopy(test_set),
File "/root/autodl-tmp/opencompass/opencompass/openicl/icl_evaluator/icl_base_evaluator.py", line 99, in evaluate
results = self.score(
File "/root/autodl-tmp/opencompass/opencompass/evaluator/generic_llm_evaluator.py", line 83, in score
self.build_inferencer()
File "/root/autodl-tmp/opencompass/opencompass/evaluator/generic_llm_evaluator.py", line 66, in build_inferencer
model = build_model_from_cfg(model_cfg=self.judge_cfg)
File "/root/autodl-tmp/opencompass/opencompass/utils/build.py", line 24, in build_model_from_cfg
return MODELS.build(model_cfg)
File "/root/autodl-tmp/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/registry.py", line 570, in build
return self.build_func(cfg, *args, **kwargs, registry=self)
File "/root/autodl-tmp/.conda/envs/opencompass/lib/python3.10/site-packages/mmengine/registry/build_functions.py", line 74, in build_from_cfg
raise KeyError(
KeyError: 'cfg or default_args must contain the key "type", but got {}\nNone'

Other information

No response

The text was updated successfully, but these errors were encountered:

t822876884 · 2025-03-05T06:26:30Z

另外，我是通过xinference启动的32B量化模型

MaiziXiao · 2025-03-05T06:28:32Z

The configuration you use requires an LLM as the Judger to verify the result.
Follow examples/eval_deepseek_r1.py and also check https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html

t822876884 · 2025-03-05T07:41:17Z

The configuration you use requires an LLM as the Judger to verify the result. Follow examples/eval_deepseek_r1.py and also check https://opencompass.readthedocs.io/zh-cn/latest/advanced_guides/llm_judge.html

Okay, I will try again. By the way, how can I speed up the evaluation process?It takes too long for once eval.

MaiziXiao · 2025-03-05T08:57:20Z

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A
Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

From the log your provided, it looks to me that each of your question takes very long time (i.e. more than 10 minutes) ，try to request your vLLM server independently to see if it also takes that long time.

t822876884 · 2025-03-05T14:51:01Z

Inferencing: 17%|█▋ | 1/6 [02:51<14:16, 171.22s/it]�[A

Inferencing: 33%|███▎ | 2/6 [32:34<1:14:37, 1119.50s/it]�[A Inferencing: 100%|██████████| 6/6 [32:34<00:00, 325.75s/it]

From the log your provided, it looks to me that each of your question takes very long time (i.e. more than 10 minutes) ，try to request your vLLM server independently to see if it also takes that long time.

It's fast to request the vLLM server independently.

mm-assistant bot assigned MaiziXiao Mar 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

t822876884 commented Mar 5, 2025 •

edited

Loading

t822876884 commented Mar 5, 2025

MaiziXiao commented Mar 5, 2025

t822876884 commented Mar 5, 2025

MaiziXiao commented Mar 5, 2025

t822876884 commented Mar 5, 2025

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

[Bug] Deepseek-R1-Distill-32B 评测报错 #1914

Comments

t822876884 commented Mar 5, 2025 • edited Loading

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

eval_deepseek.py

vllm_deepseek_r1_distill_qwen_32b_int4.py

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

t822876884 commented Mar 5, 2025

MaiziXiao commented Mar 5, 2025

t822876884 commented Mar 5, 2025

MaiziXiao commented Mar 5, 2025

t822876884 commented Mar 5, 2025

t822876884 commented Mar 5, 2025 •

edited

Loading