Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

There is a problem with the dataset downloaded from Hugging Face. #9

Open
whu125 opened this issue Nov 10, 2024 · 2 comments
Open

There is a problem with the dataset downloaded from Hugging Face. #9

whu125 opened this issue Nov 10, 2024 · 2 comments

Comments

@whu125
Copy link

whu125 commented Nov 10, 2024

Thank you for your excellent work!

I downloaded the dataset from huggingface, and when I loaded test-00000-of-00001.parquet, I found some abnormal data. Among them, 7 entries have duplicate option0 and option1 values, and the id also has an extra prefix. Did I do something wrong?

video_id @healthfood-7352910847997496619
id @healthfood-7352910847997496619_0

@whu125 whu125 changed the title Where is the correct answer There is a problem with the dataset downloaded from Hugging Face. Nov 13, 2024
@teowu
Copy link
Contributor

teowu commented Nov 18, 2024

This is strange. I will check on that. Meanwhile, please use the lvb_val.json and lvb_test_wo_gt.json for evaluation first.

@whu125
Copy link
Author

whu125 commented Nov 18, 2024

it seems that lvb_test_wo_gt.json has the same problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants