[Bug] Only Debug Mode can perform eval tasks correctly. #1859

GenerallyCovetous · 2025-02-08T01:27:02Z

Prerequisite

I have searched Issues and Discussions but cannot get the expected help.
The bug has not been fixed in the latest version.

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
'GCC': 'gcc (GCC) 7.3.0',
'MMEngine': '0.10.6',
'MUSA available': False,
'OpenCV': '4.11.0',
'PyTorch': '2.1.0',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 10.2\n'
' - C++ Version: 201703\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: NO AVX\n'
' - Build settings: BLAS_INFO=open, '
'BUILD_TYPE=Release, '
'CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=open, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.0, USE_CUDA=OFF, '
'USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:18:56) [GCC 11.2.0]',
'TorchVision': '0.16.0',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+',
'sys.platform': 'linux',
'transformers': '4.48.0'}

Reproduces the problem - code/configuration sample

When performing the eval task, use the following command.

python run.py --models hf_llama3_1_8b --datasets base_Custom --work-dir outputs/Llama3_1-8B-DP/ --summarizer base_Custom --max-num-workers 8

However, the infer task is able to infer correctly to the end, but the eval task will report an error.
But after adding the --debug mode, it can evaluate correctly again.
Note that base_Custom is some datasets for eval base model. When I add --debug parameter, the whole pipeline works.

Reproduces the problem - command or script

python run.py --models hf_llama3_1_8b --datasets base_Custom --work-dir outputs/Llama3_1-8B-DP/ --summarizer base_Custom --max-num-workers 8

Reproduces the problem - error message

Here are some examples of errors reported:
02/07 22:29:09 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/sanitized_mbpp] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/sanitized_mbpp.out
99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 307/309 [27:20<00:02, 1.43s/it]02/07 22:29:09 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/race-high] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/race-high.out
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 308/309 [27:20<00:01, 1.06s/it]02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/GPQA_diamond] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/GPQA_diamond.out
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [27:24<00:00, 5.32s/it]
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-chinese] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-english] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-geography] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-history] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-biology] failed with code -11

Other information

No response

Redias · 2025-02-11T09:51:12Z

It's a feature :)

GenerallyCovetous · 2025-02-11T09:54:37Z

It's a feature :)

you mean this is normal? then I can't just run python run.py for both infer and eval tasks

Redias · 2025-02-11T10:42:31Z

It's a feature :)

you mean this is normal? then I can't just run python run.py for both infer and eval tasks

I remember reading about this in the document somewhere. Just add --debug at all the tasks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

GenerallyCovetous commented Feb 8, 2025

Redias commented Feb 11, 2025

GenerallyCovetous commented Feb 11, 2025

Redias commented Feb 11, 2025

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

Comments

GenerallyCovetous commented Feb 8, 2025

Prerequisite

Type

Environment

Reproduces the problem - code/configuration sample

Reproduces the problem - command or script

Reproduces the problem - error message

Other information

Redias commented Feb 11, 2025

GenerallyCovetous commented Feb 11, 2025

Redias commented Feb 11, 2025