[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低 #1878

carllisicau · 2025-02-18T10:42:48Z

先决条件

我已经搜索过问题和讨论但未得到预期的帮助。
错误在最新版本中尚未被修复。

问题类型

我正在使用官方支持的任务/模型/数据集进行评估。

环境

{'CUDA available': True,
'CUDA_HOME': '/usr/local/cuda',
'GCC': 'gcc (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)',
'GPU 0,1,2,3': 'Tesla V100S-PCIE-32GB',
'MMEngine': '0.10.6',
'MUSA available': False,
'NVCC': 'Cuda compilation tools, release 11.7, V11.7.64',
'OpenCV': '4.11.0',
'PyTorch': '2.5.1+cu124',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 9.3\n'
' - C++ Version: 201703\n'
' - Intel(R) oneAPI Math Kernel Library Version '
'2024.2-Product Build 20240605 for Intel(R) 64 '
'architecture applications\n'
' - Intel(R) MKL-DNN v3.5.3 (Git Hash '
'66f0cb9eb66affd2da3bf5f8d897376f04aae6af)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: AVX512\n'
' - CUDA Runtime 12.4\n'
' - NVCC architecture flags: '
'-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90\n'
' - CuDNN 90.1\n'
' - Magma 2.6.1\n'
' - Build settings: BLAS_INFO=mkl, '
'BUILD_TYPE=Release, CUDA_VERSION=12.4, '
'CUDNN_VERSION=9.1.0, '
'CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOROCTRACER -DLIBKINETO_NOXPUPTI=ON '
'-DUSE_FBGEMM -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wsuggest-override '
'-Wno-psabi -Wno-error=old-style-cast '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Wno-stringop-overflow, LAPACK_INFO=mkl, '
'PERF_WITH_AVX=1, PERF_WITH_AVX2=1, '
'TORCH_VERSION=2.5.1, USE_CUDA=ON, USE_CUDNN=ON, '
'USE_CUSPARSELT=1, USE_EXCEPTION_PTR=1, '
'USE_GFLAGS=OFF, USE_GLOG=OFF, USE_GLOO=ON, '
'USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, '
'USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, '
'USE_ROCM=OFF, USE_ROCM_KERNEL_ASSERT=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:24:50) [GCC 11.2.0]',
'TorchVision': '0.20.1+cu124',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.4.0+862bf78',
'sys.platform': 'linux',
'transformers': '4.48.1'}

重现问题 - 代码/配置示例

使用命令启动测评
python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /root/ai/deepseek32b/DeepSeek-R1-Distill-Qwen-32B --debug --max-out-len 8096 --generation-kwargs do_sample=True top_k=50
得分仅为3.33分
{
"accuracy": 3.3333333333333335
}
查看output文件夹中的配置文件
configs文件内容如下

datasets=[
dict(abbr='aime2024',
eval_cfg=dict(
evaluator=dict(
type='opencompass.datasets.MATHEvaluator',
version='v2'),
pred_postprocessor=dict(
type='opencompass.datasets.math_postprocess_v2')),
infer_cfg=dict(
inferencer=dict(
max_out_len=2048,
type='opencompass.openicl.icl_inferencer.GenInferencer'),
prompt_template=dict(
template=dict(
round=[
dict(prompt='{question}\nRemember to put your final answer within \boxed{}.',
role='HUMAN'),
]),
type='opencompass.openicl.icl_prompt_template.PromptTemplate'),
retriever=dict(
type='opencompass.openicl.icl_retriever.ZeroRetriever')),
path='opencompass/aime2024',
reader_cfg=dict(
input_columns=[
'question',
],
output_column='answer'),
type='opencompass.datasets.Aime2024Dataset'),
]
models=[
dict(abbr='DeepSeek-R1-Distill-Qwen-32B_hf',
batch_size=8,
generation_kwargs=dict(
do_sample=True,
top_k=50,
top_p=0.95),
max_out_len=8096,
max_seq_len=None,
model_kwargs=dict(
),
pad_token_id=None,
path='/root/ai/deepseek32b/DeepSeek-R1-Distill-Qwen-32B',
peft_kwargs=dict(
),
peft_path=None,
run_cfg=dict(
num_gpus=1),
stop_words=[
],
tokenizer_kwargs=dict(
),
tokenizer_path=None,
type='opencompass.models.huggingface_above_v4_33.HuggingFacewithChatTemplate'),
]

查看predictions文件内容如下

"28": {

    "origin_prompt": [
        {
            "role": "HUMAN",
            "prompt": "\nLet $b \\geq 2$ be an integer. Call a positive integer $n$ $b\\textit{-eautiful}$ if it has exactly two digits when expressed in base $b$, and these two digits sum to $\\sqrt{n}$. For example, $81$ is $13$-eautiful because $81=\\underline{6}$$\\underline{3}_{13}$ and $6+3=\\sqrt{81}$. Find the least integer $b\\geq 2$ for which there are more than ten $b$-eautiful integers.\n\nRemember to put your final answer within \\boxed{}.\n"
        }
    ],
    "prediction": "<think>\nAlright, so I have this problem about b-eautiful numbers. Let me try to understand it step by step.\n\nFirst, the problem says that a positive integer n is b-eautiful if two conditions are met:\n\n1. When n is expressed in base b, it has exactly two digits.\n2. The sum of these two digits equals the square root of n.\n\nAn example is given: 81 is 13-eautiful because when written in base 13, it's 63 (which are the digits 6 and 3), and 6 + 3 = 9, which is the square root of 81.\n\nThe task is to find the smallest integer b (where b is at least 2) such that there are more than ten b-eautiful numbers. So, I need to find the least b where the count of such numbers exceeds ten.\n\nLet me break down the problem.\n\nFirst, let's recall that any two-digit number in base b can be written as n = a*b + c, where a and c are digits in base b. Since it's a two-digit number, a can't be zero. So, a is between 1 and b-1, and c is between 0 and b-1.\n\nGiven that n is b-eautiful, the sum of the digits a + c must equal sqrt(n). So, we have:\n\na + c = sqrt(n)\n\nBut n is also equal to a*b + c. So, substituting n, we get:\n\na + c = sqrt(a*b + c)\n\nLet me write that equation:\n\na + c = sqrt(a*b + c)\n\nHmm, maybe I can square both sides to eliminate the square root:\n\n(a + c)^2 = a*b + c\n\nExpanding the left side:\n\na^2 + 2ac + c^2 = a*b + c\n\nLet me rearrange this equation to group like terms:\n\na^2 + 2ac + c^2 - a*b - c = 0\n\nSimplify terms:\n\na^2 + (2c - b)a + c^2 - c = 0\n\nThis is a quadratic equation in terms of a. Alternatively, we can think of it as a quadratic in c, but maybe it's better to see if we can find a relationship between a and c.\n\nAlternatively, maybe we can express this as:\n\na^2 + (2c - b)a + (c^2 - c) = 0\n\nBut this seems a bit messy. Maybe another approach.\n\nLet me consider that n = a*b + c and sqrt(n) = a + c.\n\nSo, n = (a + c)^2.\n\nTherefore, a*b + c = (a + c)^2.\n\nSo, the equation is:\n\na*b + c = a^2 + 2ac + c^2\n\nLet me rearrange:\n\na*b = a^2 + 2ac + c^2 - c\n\nSo,\n\na*b = a^2 + 2ac + c^2 - c\n\nHmm, maybe factor the right-hand side:\n\na*b = a^2 + c(2a + c - 1)\n\nNot sure if that helps.\n\nAlternatively, maybe express b in terms of a and c:\n\nFrom a*b + c = (a + c)^2,\n\nWe can write:\n\nb = [(a + c)^2 - c]/a\n\nSimplify numerator:\n\n(a + c)^2 - c = a^2 + 2ac + c^2 - c\n\nTherefore,\n\nb = (a^2 + 2ac + c^2 - c)/a = a + 2c + (c^2 - c)/a\n\nSince b must be an integer, (c^2 - c) must be divisible by a. So, a divides (c^2 - c). So, a | c(c - 1).\n\nSince a and c are digits in base b, c is between 0 and b - 1, and a is between 1 and b - 1.\n\nSo, for given a and c, b is determined as:\n\nb = a + 2c + (c^2 - c)/a\n\nGiven that b must be an integer, (c^2 - c) must be divisible by a.\n\nAlternatively, since a divides c(c - 1), and a is at least 1, which it is, but since a is a digit in base b, a is at most b - 1.\n\nBut perhaps for each base b, we can find the number of pairs (a, c) such that a + c = sqrt(a*b + c), with a and c satisfying 1 ≤ a ≤ b - 1 and 0 ≤ c ≤ b - 1.\n\nBut maybe instead of thinking about it per base, it's better to find, for each a and c, the corresponding b, and then count how many b's have more than ten such numbers.\n\nWait, but the problem is to find the least b such that there are more than ten b-eautiful numbers.\n\nSo, for each b, starting from 2 upwards, count the number of n that are b-eautiful, and find the smallest b where this count exceeds 10.\n\nSo, perhaps it's better to approach it by, for each b, find the number of pairs (a, c) such that a + c = sqrt(a*b + c).\n\nBut how?\n\nAlternatively, since n = a*b + c, and sqrt(n) = a + c, then n must be a perfect square. So, n is a perfect square.\n\nTherefore, n = k^2, where k is a positive integer.\n\nGiven that n is a two-digit number in base b, so n is between b (which is 10 in base b, i.e., 1*b + 0 = b) and b^2 - 1 (which is (b-1)*b + (b-1) = b^2 - 1).\n\nTherefore, k^2 is between b and b^2 - 1.\n\nSo, k must satisfy sqrt(b) ≤ k ≤ sqrt(b^2 - 1) ≈ b - 1/b, so k is between 2 and b - 1, since b ≥ 2.\n\nBut more precisely, k must satisfy b ≤ k^2 ≤ b^2 - 1.\n\nSo, sqrt(b) ≤ k ≤ sqrt(b^2 - 1).\n\nSince sqrt(b^2 - 1) is slightly less than b, so k must be less than or equal to b - 1.\n\nBut also, k must be at least 2, since b is at least 2, so sqrt(b) is at least sqrt(2) ≈ 1.414, so k must be at least 2.\n\nWait, for b=2, sqrt(2) is about 1.414, so k must be at least 2, but 2^2 = 4. But in base 2, the two-digit numbers go from 2 (10_2) to 3 (11_2). So, n can be 2 or 3. So, 4 is not a two-digit number in base 2. So, for b=2, the maximum n is 3, which is less than 4, so k=2 is too big. So, maybe for b=2, there are no b-eautiful numbers.\n\nBut let's check:\n\nFor b=2, n must be two digits in base 2, so n can be 2 or 3.\n\nFor n=2: digits are 10_2, so a=1, c=0. Sum is 1 + 0 = 1, which is not sqrt(2).\n\nFor n=3: digits are 11_2, sum is 1 + 1 = 2, which is sqrt(3)? No, sqrt(3) ≈ 1.732, so not equal. So, indeed, b=2 has no b-eautiful numbers.\n\nSimilarly, for b=3:\n\nTwo-digit numbers in base 3 are from 3 (10_3) to 8 (22_3).\n\nSo, n can be 3,4,5,6,7,8.\n\nCompute sqrt(n):\n\nn=3: sqrt≈1.732, sum digits: a + c.\n\nn=3: 10_3, a=1, c=0, sum=1 ≠ 1.732.\n\nn=4: 11_3, sum=2. sqrt(4)=2. So, 2=2, so 4 is 3-eautiful.\n\nn=5: 12_3, sum=1+2=3. sqrt(5)≈2.236. Not equal.\n\nn=6: 20_3, sum=2+0=2. sqrt(6)≈2.449. Not equal.\n\nn=7: 21_3, sum=2+1=3. sqrt(7)≈2.645. Not equal.\n\nn=8: 22_3, sum=4. sqrt(8)≈2.828. Not equal.\n\nSo, only n=4 is 3-eautiful.\n\nThus, for b=3, there is 1 b-eautiful number.\n\nSimilarly, for b=4:\n\nTwo-digit numbers in base 4: 4 (10_4) to 15 (33_4).\n\nCompute for each n from 4 to 15:\n\nn=4: 10_4, sum=1. sqrt(",
    "gold": "211"
},
"29": {
    "origin_prompt": [
        {
            "role": "HUMAN",
            "prompt": "\nFind the number of rectangles that can be formed inside a fixed regular dodecagon ($12$-gon) where each side of the rectangle lies on either a side or a diagonal of the dodecagon. The diagram below shows three of those rectangles. [asy] unitsize(0.6 inch); for(int i=0; i<360; i+=30) { dot(dir(i), 4+black); draw(dir(i)--dir(i+30)); } draw(dir(120)--dir(330)); filldraw(dir(210)--dir(240)--dir(30)--dir(60)--cycle, mediumgray, linewidth(1.5)); draw((0,0.366)--(0.366,0), linewidth(1.5)); [/asy]\n\nRemember to put your final answer within \\boxed{}.\n"
        }
    ],
    "prediction": "<think>\nOkay, so I have this problem where I need to find the number of rectangles that can be formed inside a regular dodecagon, which is a 12-sided polygon. Each side of the rectangle has to lie on either a side or a diagonal of the dodecagon. The Asymptote diagram shows three examples of such rectangles, so I know they can be of different sizes and orientations.\n\nFirst, I should recall that a regular dodecagon has all sides equal and all internal angles equal. Each internal angle in a regular dodecagon is 150 degrees because the formula for internal angles is \\((n-2) \\times 180^\\circ / n\\), which for \\(n=12\\) gives \\((12-2) \\times 180 / 12 = 150^\\circ\\).\n\nNow, the problem is about finding rectangles. So, I need to figure out how rectangles can be formed by connecting sides or diagonals of the dodecagon. Since a rectangle has four sides with opposite sides equal and all angles 90 degrees, the key is to find sets of four points (vertices of the dodecagon) that form such a shape.\n\nLet me think about how the sides and diagonals of the dodecagon can form rectangles. In a regular polygon, the sides are the edges connecting consecutive vertices, and diagonals are the lines connecting non-consecutive vertices. In a dodecagon, each vertex is connected to 11 others, so there are a lot of diagonals.\n\nBut for a rectangle, we need four vertices such that the sides are either sides or diagonals of the dodecagon. So, maybe the sides of the rectangle correspond to sides or diagonals of the dodecagon.\n\nI remember that in regular polygons, certain diagonals can form rectangles or other shapes. For example, in a regular octagon, you can form rectangles by connecting every other vertex.\n\nSimilarly, in a regular dodecagon, perhaps connecting every third vertex or some other interval can form rectangles. Let me think about how the sides and diagonals can form right angles.\n\nWait, in a regular polygon, the central angles between consecutive vertices are equal. For a dodecagon, each central angle is \\(360^\\circ / 12 = 30^\\circ\\). So, the angle between two adjacent vertices from the center is 30 degrees.\n\nTo form a rectangle, the sides must meet at 90-degree angles. So, the vectors corresponding to the sides of the rectangle must be perpendicular. Since the polygon is regular, the vectors can be represented in terms of the central angles.\n\nSo, if I can find two vectors that are perpendicular and lie on the sides or diagonals of the dodecagon, then I can form a rectangle by combining them.\n\nLet me denote the vertices of the dodecagon as \\(V_0, V_1, V_2, \\ldots, V_{11}\\) going around the polygon. Each vertex \\(V_k\\) can be represented in the complex plane as \\(e^{i \\theta_k}\\) where \\(\\theta_k = 30^\\circ \\times k\\).\n\nIf I can find four points \\(V_a, V_b, V_c, V_d\\) such that the vectors \\(V_b - V_a\\) and \\(V_d - V_a\\) are perpendicular, and similarly for other sides, then they form a rectangle.\n\nBut maybe there's a simpler way. Since the dodecagon is regular and symmetric, maybe I can count the number of rectangles based on the number of pairs of parallel sides.\n\nWait, rectangles have opposite sides equal and parallel. So, in the dodecagon, if I can find two pairs of parallel chords (sides or diagonals) that are perpendicular to each other, they can form a rectangle.\n\nSo, perhaps I should figure out how many pairs of parallel chords exist in the dodecagon and then see how many of these pairs are perpendicular.\n\nBut first, how many pairs of parallel sides or diagonals are there in a regular dodecagon?\n\nIn a regular polygon with \\(n\\) sides, the number of pairs of parallel sides is \\(n/2\\) if \\(n\\) is even. Wait, for a dodecagon, which has 12 sides, the number of pairs of parallel sides is 6, since each side has one parallel side opposite to it.\n\nBut in addition to sides, there are diagonals that can also be parallel. So, the number of pairs of parallel diagonals is more complicated.\n\nWait, in a regular polygon, the number of directions for parallel chords depends on the number of sides. For a dodecagon, each chord can be defined by the number of vertices it skips. For example, a side skips 0 vertices, a diagonal that skips 1 vertex is another type, skips 2, skips 3, etc., up to skipping 5 vertices (since beyond that, it's the same as skipping fewer in the other direction).\n\nSo, for a regular dodecagon, chords can skip \\(k = 0, 1, 2, 3, 4, 5\\) vertices. Each \\(k\\) gives a set of parallel chords.\n\nTherefore, each direction corresponds to a step size \\(k\\), and each step size \\(k\\) from 1 to 5 (since \\(k=0\\) is the sides themselves) gives a set of parallel diagonals. So, for each \\(k = 1\\) to \\(5\\), there are 12 diagonals each, but they are grouped into parallel sets.\n\nWait, actually, for each \\(k\\), the number of distinct directions is 6 because of symmetry. Hmm, maybe I need to think differently.\n\nWait, perhaps each step size \\(k\\) and \\(12 - k\\) gives the same direction but in opposite orientation. So, for \\(k = 1\\) and \\(k = 11\\), they are the same direction but opposite; similarly for \\(k=2\\) and \\(k=10\\), etc. So, for step sizes, we can consider \\(k = 1\\) to \\(6\\), but beyond \\(k=6\\), it's the same as smaller \\(k\\) in the opposite direction.\n\nBut in our case, since we have 12 sides, each step size \\(k\\) from 1 to 5 gives a unique direction, and \\(k=6\\) is the diameter, which is its own opposite.\n\nWait, actually, in a regular dodecagon, the diameters (which connect opposite vertices) are the only chords that are their own opposites. So, for each \\(k\\) from 1 to 5, there are two directions (clockwise and counterclockwise), but in terms of parallelism, they are the same direction.\n\nWait, maybe it's better to think in terms of slopes. Each chord with step size \\(k\\) will have a certain slope, and chords with step size \\(k\\) and \\(12 - k\\) will have slopes that are negatives of each other, hence not parallel. So, actually, for each \\(k\\) from 1 to 5, there is a unique set of parallel chords.\n\nTherefore, in total, there are 6 different directions for chords: step sizes \\(k = 0\\) (sides), \\(1\\), \\(2\\), \\(3\\), \\(4\\), \\(5\\), and \\(6\\) (diameters). But step size \\(6\\) is just the diameter, which is a single direction.\n\nWait, no, step size \\(k\\) and \\(12 - k\\) are different directions because they go in opposite directions around the polygon. So, for each \\(k = 1\\) to \\(5\\), we have two directions, but they are not parallel. So, actually, each step size \\(k\\) corresponds to a unique direction. So, for step sizes \\(k = 0\\) (sides), \\(1\\), \\(2\\), \\(3\\), \\(4\\), \\(5\\), and \\(6\\), each gives a unique direction.\n\nWait, but for step size \\(k=6\\), it's the diameter, so it's only one direction because going \\(6\\) steps in either direction from a vertex gets you to the same opposite vertex.\n\nSo, in total, there are 7 different directions for chords: sides (k=0), diameters (k=6), and for \\(k=1\\) to \\(5\\), each gives two directions, but they are not parallel. Wait, no, actually, for each \\(k\\), the chords are parallel if they have the same step size. So, chords with the same \\(k\\) are parallel, regardless of starting point.\n\nTherefore, for each \\(k\\) from 0 to 6, the chords with step size \\(k\\) are all parallel to each other. So, in a dodecagon, how many unique directions do we have? For each \\(k = 0, 1, 2, 3, 4, 5, 6\\), we have a unique direction.\n\nBut wait, for \\(k = 1\\) and \\(k = 11\\), are they the same? Because stepping 1 forward or 11 backward is the same direction.\n\nWait, actually, in a regular polygon, stepping \\(k\\) forward is equivalent to stepping \\(n - k\\) backward. So, for direction purposes, stepping \\(k\\) or \\(n - k\\) gives the same direction. So, in a 12-gon, stepping 1 and stepping 11 give the same direction but in opposite orientations.\n\nBut when considering parallelism,",
    "gold": "315"
}

发现推理并未完成，这可能是导致得分底下的主要原因

重现问题 - 命令或脚本

以上

重现问题 - 错误信息

以上

其他信息

No response

The text was updated successfully, but these errors were encountered:

Sibyl233 · 2025-02-22T09:58:39Z

DeepSeek R1论文里设置max_out_len=32768，2048是不够的

msz12345 · 2025-02-23T13:33:03Z

DeepSeek R1论文里设置max_out_len=32768，2048是不够的

改成32768之后accuracy仍然只有3.33，我用的deepseek-distill-Qwen2-7B
(smoothquantpre) [maoshizhuo@ISPC-GPU2-CS opencompass]$ CUDA_VISIBLE_DEVICES=1 python run.py --datasets aime2024_0shot_nocot_gen_2b9dc2 --hf-type chat --hf-path /home/maoshizhuo/2025/deepseek-Qwen-7B --debug --max-out-len 32768 --generation-kwargs do_sample=True top_k=50
02/23 21:24:40 - OpenCompass - INFO - Loading aime2024_0shot_nocot_gen_2b9dc2: /home/maoshizhuo/2025/GPassK/opencompass/opencompass/configs/./datasets/aime2024/aime2024_0shot_nocot_gen_2b9dc2.py
02/23 21:24:40 - OpenCompass - INFO - Loading example: /home/maoshizhuo/2025/GPassK/opencompass/opencompass/configs/./summarizers/example.py
02/23 21:24:40 - OpenCompass - INFO - Current exp folder: outputs/default/20250223_212440
02/23 21:24:40 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
02/23 21:24:40 - OpenCompass - INFO - Partitioned into 1 tasks.
02/23 21:24:41 - OpenCompass - INFO - Task [deepseek-Qwen-7B_hf/aime2024]
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:10<00:00, 5.06s/it]
02/23 21:24:56 - OpenCompass - INFO - using stop words: ['<｜end▁of▁sentence｜>']
02/23 21:24:56 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/aime.jsonl
02/23 21:24:56 - OpenCompass - INFO - Start inferencing [deepseek-Qwen-7B_hf/aime2024]
[2025-02-23 21:24:56,836] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-02-23 21:24:56,837] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/4 [00:00<?, ?it/s]02/23 21:24:56 - OpenCompass - INFO - Generation Args of Huggingface:
02/23 21:24:56 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f7184512980>], 'max_new_tokens': 2048, 'pad_token_id': 151643}
25%|█████████████████████████████████ | 1/4 [01:44<05:14, 104.73s/it]02/23 21:26:41 - OpenCompass - INFO - Generation Args of Huggingface:
02/23 21:26:41 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a8490>], 'max_new_tokens': 2048, 'pad_token_id': 151643}
50%|██████████████████████████████████████████████████████████████████ | 2/4 [03:24<03:23, 101.67s/it]02/23 21:28:21 - OpenCompass - INFO - Generation Args of Huggingface:
02/23 21:28:21 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a9840>], 'max_new_tokens': 2048, 'pad_token_id': 151643}
75%|███████████████████████████████████████████████████████████████████████████████████████████████████ | 3/4 [05:04<01:40, 100.80s/it]02/23 21:30:00 - OpenCompass - INFO - Generation Args of Huggingface:
02/23 21:30:00 - OpenCompass - INFO - {'do_sample': True, 'top_k': 50, 'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria..MultiTokenEOSCriteria object at 0x7f71845a9a20>], 'max_new_tokens': 2048, 'pad_token_id': 151643}
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [06:35<00:00, 98.96s/it]
02/23 21:31:32 - OpenCompass - INFO - Partitioned into 1 tasks.
02/23 21:31:33 - OpenCompass - INFO - Try to load the data from /home/maoshizhuo/.cache/opencompass/./data/aime.jsonl
02/23 21:31:33 - OpenCompass - INFO - Task [deepseek-Qwen-7B_hf/aime2024]: {'accuracy': 3.3333333333333335}
dataset version metric mode deepseek-Qwen-7B_hf

aime2024 2b9dc2 accuracy gen 3.33

nku-ligl · 2025-02-25T02:32:57Z

我也遇到了这个问题，光改模型的max_out_len不行的，因为他数据集的参数限制是2048，要两个都改。outputs里有参数相关的py文件，可以看到数据集的max_out_len。然后我去数据集相关的源码里改的，才解决。

msz12345 · 2025-02-25T02:57:07Z

我也遇到了这个问题，光改模型的max_out_len不行的，因为他数据集的参数限制是2048，要两个都改。outputs里有参数相关的py文件，可以看到数据集的max_out_len。然后我去数据集相关的源码里改的，才解决。

请问一下你说的数据集相关的源码是哪个源码呢？我找了dataset的aime2024.py并没有找到限制输出长度的代码，谢谢！

nku-ligl · 2025-02-25T03:35:32Z

configs/datasets/aime2024/，你用哪个版本的数据集就改哪个版本的代码，改max_out_len

wccccp · 2025-02-27T13:18:46Z

请问改了max out len 解决了吗

configs/datasets/aime2024/，你用哪个版本的数据集就改哪个版本的代码，改max_out_len

msz12345 · 2025-02-27T13:21:35Z

请问改了max out len 解决了吗

configs/datasets/aime2024/，你用哪个版本的数据集就改哪个版本的代码，改max_out_len

解决了，得到的结果非常准确，花了9个多小时，在这里多谢 @nku-ligl 同仁了！

hh0o0hh · 2025-02-28T06:31:23Z

请问你跑出来多少分，能跟官方的数据对上嘛

msz12345 · 2025-03-01T03:18:05Z

请问你跑出来多少分，能跟官方的数据对上嘛

和官方的差不多，好像是30.3%，官方的低一点点

tonysy · 2025-03-04T09:06:16Z

We have provided an example on how to re-implement the AIME for DeepSeek-R1-32B. Please check: https://github.com/open-compass/opencompass/blob/main/docs/en/user_guides/deepseek_r1.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低 #1878

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低 #1878

carllisicau commented Feb 18, 2025

Sibyl233 commented Feb 22, 2025

msz12345 commented Feb 23, 2025

nku-ligl commented Feb 25, 2025

msz12345 commented Feb 25, 2025

nku-ligl commented Feb 25, 2025

wccccp commented Feb 27, 2025

msz12345 commented Feb 27, 2025

hh0o0hh commented Feb 28, 2025

msz12345 commented Mar 1, 2025

tonysy commented Mar 4, 2025

[Bug] DeepSeek R1 32B 模型 测评 AIME2024 数据集 得分低 #1878

[Bug] DeepSeek R1 32B 模型 测评 AIME2024 数据集 得分低 #1878

Comments

carllisicau commented Feb 18, 2025

先决条件

问题类型

环境

重现问题 - 代码/配置示例

重现问题 - 命令或脚本

重现问题 - 错误信息

其他信息

Sibyl233 commented Feb 22, 2025

msz12345 commented Feb 23, 2025

nku-ligl commented Feb 25, 2025

msz12345 commented Feb 25, 2025

nku-ligl commented Feb 25, 2025

wccccp commented Feb 27, 2025

msz12345 commented Feb 27, 2025

hh0o0hh commented Feb 28, 2025

msz12345 commented Mar 1, 2025

tonysy commented Mar 4, 2025

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低 #1878

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低 #1878