-
Notifications
You must be signed in to change notification settings - Fork 518
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] DeepSeek R1 32B 模型 测评 AIME2024 数据集 得分低 #1878
Comments
DeepSeek R1论文里设置max_out_len=32768,2048是不够的 |
改成32768之后accuracy仍然只有3.33,我用的deepseek-distill-Qwen2-7B aime2024 2b9dc2 accuracy gen 3.33 |
我也遇到了这个问题,光改模型的max_out_len不行的,因为他数据集的参数限制是2048,要两个都改。outputs里有参数相关的py文件,可以看到数据集的max_out_len。然后我去数据集相关的源码里改的,才解决。 |
请问一下你说的数据集相关的源码是哪个源码呢?我找了dataset的aime2024.py并没有找到限制输出长度的代码,谢谢! |
configs/datasets/aime2024/,你用哪个版本的数据集就改哪个版本的代码,改max_out_len |
请问改了max out len 解决了吗
|
解决了,得到的结果非常准确,花了9个多小时,在这里多谢 @nku-ligl 同仁了! |
请问你跑出来多少分,能跟官方的数据对上嘛 |
和官方的差不多,好像是30.3%,官方的低一点点 |
We have provided an example on how to re-implement the AIME for DeepSeek-R1-32B. Please check: https://github.com/open-compass/opencompass/blob/main/docs/en/user_guides/deepseek_r1.md |
先决条件
问题类型
我正在使用官方支持的任务/模型/数据集进行评估。
环境
{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
'GPU 0,1,2,3': 'Tesla V100S-PCIE-32GB',
'MMEngine': '0.10.6',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.7, V11.7.64',
'OpenCV': '4.11.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.4.0+862bf78',
'sys.platform': 'linux',
'transformers': '4.48.1'}
重现问题 - 代码/配置示例
使用命令启动测评
python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /root/ai/deepseek32b/DeepSeek-R1-Distill-Qwen-32B --debug --max-out-len 8096 --generation-kwargs do_sample=True top_k=50
得分仅为3.33分
{
"accuracy": 3.3333333333333335
}
查看output文件夹中的配置文件
configs文件内容如下
查看predictions文件内容如下
发现推理并未完成,这可能是导致得分底下的主要原因
重现问题 - 命令或脚本
以上
重现问题 - 错误信息
以上
其他信息
No response
The text was updated successfully, but these errors were encountered: