-
Notifications
You must be signed in to change notification settings - Fork 518
Issues: open-compass/opencompass
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug] 使用deepseek-v2-lite-chat进行humaneval测试时,出现问题,无论是在官网还是本地部署测试分数只有1.22,其他模型都正常
#1926
opened Mar 10, 2025 by
shuowoshishui
2 tasks done
[Feature] Cannot find a huggingafce source code to reload model找不到加载hf模型的源代码
#1921
opened Mar 7, 2025 by
msz12345
1 task
[Bug] Please help.Unable to evaluate Deepseek-R1-Qwen-32B 4-bit quantized version
#1918
opened Mar 5, 2025 by
t822876884
2 tasks done
[Bug] no error during the evaluation runtime,but the result was empty, and it said "Floating point exception (core dumped)" in output
#1903
opened Mar 3, 2025 by
azuercici
2 tasks done
[Bug] Take too much time on MATH-500 dataset evaluation
#1895
opened Feb 26, 2025 by
msz12345
2 tasks done
[Feature] Can we support terminate dlc and volc tasks when oc evaluation task is terminated.
#1884
opened Feb 20, 2025 by
zhulinJulia24
1 task
[Bug] human-eval assertion failed while testing partial data
#1880
opened Feb 19, 2025 by
dengbinbox
2 tasks done
[Feature] 我需要对本地部署的Qwen-110B模型进行MMLU基准测试,请问该怎么操作呢?
#1877
opened Feb 18, 2025 by
hi112233445566
1 task
[Feature] 目前是否有适配Codeforces、SWE Verified、Aider-Polyglot这些在R1中出现的数据集的计划呢?
#1875
opened Feb 17, 2025 by
linbeyoung
1 task
[Bug] Medbench dataset only provides test data, not the entire dataset
#1874
opened Feb 16, 2025 by
ryan0980
2 tasks done
[Bug] Only Debug Mode can perform eval tasks correctly.
#1859
opened Feb 8, 2025 by
GenerallyCovetous
2 tasks done
[Bug] 在对DeepSeek-R1-Distill-Qwen-1.5B模型评测livecodebench数据集时,lcb_test_output为什么为0呢?
#1856
opened Feb 7, 2025 by
guoguo1314
2 tasks done
[Bug] MBPP score significantly lower than official results
#1855
opened Feb 7, 2025 by
GenerallyCovetous
2 tasks done
Previous Next
ProTip!
Follow long discussions with comments:>50.