open-compass / opencompass Public

Notifications You must be signed in to change notification settings
Fork 518
Star 4.9k

Code
Issues 268
Pull requests 36
Discussions
Actions
Projects
Wiki
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Wiki
Security
Insights

Issues: open-compass/opencompass

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

268 Open 370 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

[Bug] 使用deepseek-v2-lite-chat进行humaneval测试时，出现问题，无论是在官网还是本地部署测试分数只有1.22，其他模型都正常

#1926 opened Mar 10, 2025 by shuowoshishui

2 tasks done

[Feature] Cannot find a huggingafce source code to reload model找不到加载hf模型的源代码

#1921 opened Mar 7, 2025 by msz12345

1 task

[Bug] Please help.Unable to evaluate Deepseek-R1-Qwen-32B 4-bit quantized version

#1918 opened Mar 5, 2025 by t822876884

2 tasks done

[Feature] Offering help with Unit Tests

#1915 opened Mar 5, 2025 by erlapso

[Bug] Deepseek-R1-Distill-32B 评测报错

#1914 opened Mar 5, 2025 by t822876884

2 tasks done

[Bug] xunfei api 评测报错

#1913 opened Mar 5, 2025 by wtlwang2019

2 tasks done

[Feature] 是否有计划兼容python3.11

#1911 opened Mar 4, 2025 by linbeyoung

1 task

[Bug] no error during the evaluation runtime，but the result was empty, and it said "Floating point exception (core dumped)" in output

#1903 opened Mar 3, 2025 by azuercici

2 tasks done

[Bug] Take too much time on MATH-500 dataset evaluation

#1895 opened Feb 26, 2025 by msz12345

2 tasks done

[Bug] 使用max-num-worker使得ssh断连

#1887 opened Feb 23, 2025 by timturing

2 tasks done

[Feature] Can we support terminate dlc and volc tasks when oc evaluation task is terminated.

#1884 opened Feb 20, 2025 by zhulinJulia24

1 task

[Bug] human-eval assertion failed while testing partial data

#1880 opened Feb 19, 2025 by dengbinbox

2 tasks done

[Bug] DeepSeek R1 32B 模型测评 AIME2024 数据集得分低

#1878 opened Feb 18, 2025 by carllisicau

2 tasks done

[Feature] 我需要对本地部署的Qwen-110B模型进行MMLU基准测试，请问该怎么操作呢？

#1877 opened Feb 18, 2025 by hi112233445566

1 task

[Feature] 目前是否有适配Codeforces、SWE Verified、Aider-Polyglot这些在R1中出现的数据集的计划呢？

#1875 opened Feb 17, 2025 by linbeyoung

1 task

[Bug] Medbench dataset only provides test data, not the entire dataset

#1874 opened Feb 16, 2025 by ryan0980

2 tasks done

[Bug] The relative path in tools/list_configs.py

#1873 opened Feb 15, 2025 by Sibyl233

2 tasks done

[Feature] expose max_task_size in run.py, for quick debug

#1869 opened Feb 13, 2025 by yxdyc

1 task

[Feature] 是否可以有输出结果后，单独去计算一下评分的功能

#1866 opened Feb 12, 2025 by Dagoli

1 task

[Feature] 请问如何加载本地模型进行测评

#1865 opened Feb 11, 2025 by Castrol68

1 task

[Bug] 使用提示词攻击，出现导入模块失败

#1860 opened Feb 8, 2025 by pomliuxj

2 tasks done

[Bug] Only Debug Mode can perform eval tasks correctly.

#1859 opened Feb 8, 2025 by GenerallyCovetous

2 tasks done

[Bug] chinese simpleQA dataset is not working

#1858 opened Feb 7, 2025 by hailsham

2 tasks done

[Bug] 在对DeepSeek-R1-Distill-Qwen-1.5B模型评测livecodebench数据集时，lcb_test_output为什么为0呢？

#1856 opened Feb 7, 2025 by guoguo1314

2 tasks done

[Bug] MBPP score significantly lower than official results

#1855 opened Feb 7, 2025 by GenerallyCovetous

2 tasks done

Previous 1 2 3 4 5 … 10 11 Next

Previous Next

ProTip! Follow long discussions with comments:>50.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly