Update Vinoground to make evaluation consistent with paper #354

HanSolo9682 · 2024-10-25T21:15:01Z

As mentioned by Issue #350, the results of video score and group score on many models are inconsistent with the original paper. This is because in the lmms-eval code we initially provided, we did not use the shuffled questions like we did in the paper. We update the code to correctly reflect that and we are now able to reproduce results on llava-ov-qwen2-7b.

kcz358 · 2024-10-26T12:57:31Z

Thank you for fixing the bugs in the issue! Merging this PR

…MMs-Lab#354) * add vinoground * make evaluation consistent to paper --------- Co-authored-by: jzhang2427 <jzhang2427@wisc.edu>

jzhang2427 and others added 3 commits October 16, 2024 01:48

add vinoground

2d466f7

Merge branch 'EvolvingLMMs-Lab:main' into main

dc319d7

make evaluation consistent to paper

8f141ce

HanSolo9682 mentioned this pull request Oct 25, 2024

Inconsistent evaluation result on Vinoground #350

Closed

kcz358 approved these changes Oct 26, 2024

View reviewed changes

kcz358 merged commit f255e5b into EvolvingLMMs-Lab:main Oct 26, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update Vinoground to make evaluation consistent with paper #354

Update Vinoground to make evaluation consistent with paper #354

HanSolo9682 commented Oct 25, 2024

kcz358 commented Oct 26, 2024

Update Vinoground to make evaluation consistent with paper #354

Update Vinoground to make evaluation consistent with paper #354

Conversation

HanSolo9682 commented Oct 25, 2024

kcz358 commented Oct 26, 2024