Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

Open
2 tasks done
GenerallyCovetous opened this issue Feb 8, 2025 · 3 comments
Open
2 tasks done

[Bug] Only Debug Mode can perform eval tasks correctly. #1859

GenerallyCovetous opened this issue Feb 8, 2025 · 3 comments

Comments

@GenerallyCovetous
Copy link

Prerequisite

Type

I'm evaluating with the officially supported tasks/models/datasets.

Environment

{'CUDA available': True,
'GCC': 'gcc (GCC) 7.3.0',
'MMEngine': '0.10.6',
'MUSA available': False,
'OpenCV': '4.11.0',
'PyTorch': '2.1.0',
'PyTorch compiling details': 'PyTorch built with:\n'
' - GCC 10.2\n'
' - C++ Version: 201703\n'
' - Intel(R) MKL-DNN v3.1.1 (Git Hash '
'64f6bcbcbab628e96f33a62c3e975f8535a7bde4)\n'
' - OpenMP 201511 (a.k.a. OpenMP 4.5)\n'
' - LAPACK is enabled (usually provided by '
'MKL)\n'
' - NNPACK is enabled\n'
' - CPU capability usage: NO AVX\n'
' - Build settings: BLAS_INFO=open, '
'BUILD_TYPE=Release, '
'CXX_COMPILER=/opt/rh/devtoolset-10/root/usr/bin/c++, '
'CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 '
'-fabi-version=11 -fvisibility-inlines-hidden '
'-DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO '
'-DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER '
'-DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK '
'-DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE '
'-O2 -fPIC -Wall -Wextra -Werror=return-type '
'-Werror=non-virtual-dtor -Werror=bool-operation '
'-Wnarrowing -Wno-missing-field-initializers '
'-Wno-type-limits -Wno-array-bounds '
'-Wno-unknown-pragmas -Wno-unused-parameter '
'-Wno-unused-function -Wno-unused-result '
'-Wno-strict-overflow -Wno-strict-aliasing '
'-Wno-stringop-overflow -Wno-psabi '
'-Wno-error=pedantic -Wno-error=old-style-cast '
'-Wno-invalid-partial-specialization '
'-Wno-unused-private-field '
'-Wno-aligned-allocation-unavailable '
'-Wno-missing-braces -fdiagnostics-color=always '
'-faligned-new -Wno-unused-but-set-variable '
'-Wno-maybe-uninitialized -fno-math-errno '
'-fno-trapping-math -Werror=format '
'-Werror=cast-function-type '
'-Wno-stringop-overflow, LAPACK_INFO=open, '
'TORCH_DISABLE_GPU_ASSERTS=ON, '
'TORCH_VERSION=2.1.0, USE_CUDA=OFF, '
'USE_CUDNN=OFF, USE_EIGEN_FOR_BLAS=ON, '
'USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, '
'USE_GLOG=OFF, USE_MKL=OFF, USE_MKLDNN=ON, '
'USE_MPI=OFF, USE_NCCL=OFF, USE_NNPACK=ON, '
'USE_OPENMP=ON, USE_ROCM=OFF, \n',
'Python': '3.10.16 (main, Dec 11 2024, 16:18:56) [GCC 11.2.0]',
'TorchVision': '0.16.0',
'lmdeploy': "not installed:No module named 'lmdeploy'",
'numpy_random_seed': 2147483648,
'opencompass': '0.3.9+',
'sys.platform': 'linux',
'transformers': '4.48.0'}

Reproduces the problem - code/configuration sample

When performing the eval task, use the following command.

python run.py --models hf_llama3_1_8b --datasets base_Custom --work-dir outputs/Llama3_1-8B-DP/ --summarizer base_Custom --max-num-workers 8

However, the infer task is able to infer correctly to the end, but the eval task will report an error.
But after adding the --debug mode, it can evaluate correctly again.
Note that base_Custom is some datasets for eval base model. When I add --debug parameter, the whole pipeline works.

Reproduces the problem - command or script

python run.py --models hf_llama3_1_8b --datasets base_Custom --work-dir outputs/Llama3_1-8B-DP/ --summarizer base_Custom --max-num-workers 8

Reproduces the problem - error message

Here are some examples of errors reported:
02/07 22:29:09 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/sanitized_mbpp] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/sanitized_mbpp.out
99%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▋ | 307/309 [27:20<00:02, 1.43s/it]02/07 22:29:09 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/race-high] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/race-high.out
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████▎| 308/309 [27:20<00:01, 1.06s/it]02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/local.py - _launch - 250 - task OpenICLEval[llama-3_1-8b-hf/GPQA_diamond] fail, see
outputs/Llama3_1-8B-DP/20250207_162403/logs/eval/llama-3_1-8b-hf/GPQA_diamond.out
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 309/309 [27:24<00:00, 5.32s/it]
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-chinese] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-english] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-geography] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-history] failed with code -11
02/07 22:29:13 - OpenCompass - ERROR - /opencompass-main/opencompass/runners/base.py - summarize - 64 - OpenICLEval[llama-3_1-8b-hf/agieval-gaokao-biology] failed with code -11

Other information

No response

@Redias
Copy link

Redias commented Feb 11, 2025

It's a feature :)

@GenerallyCovetous
Copy link
Author

It's a feature :)

you mean this is normal? then I can't just run python run.py for both infer and eval tasks

@Redias
Copy link

Redias commented Feb 11, 2025

It's a feature :)

you mean this is normal? then I can't just run python run.py for both infer and eval tasks

I remember reading about this in the document somewhere. Just add --debug at all the tasks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants