Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where can I find the COMIEVAL dataset mentioned in the paper #2

Open
zwangsyc opened this issue Nov 28, 2024 · 3 comments
Open

Where can I find the COMIEVAL dataset mentioned in the paper #2

zwangsyc opened this issue Nov 28, 2024 · 3 comments

Comments

@zwangsyc
Copy link

Can you also share the COMIEVAL dataset? I am looking into the topic data contamination as well, I found this project is very interesting and inspiring. But I notice that I can not find the COMIEVAL dataset in this repo. Can you please tell me how to get access to this dataset:)

@YihongDong
Copy link
Owner

Thanks for your attention to our work! You can find them in Google Drive, the links are as follows:
https://drive.google.com/file/d/1FTakU4HIXz00rg8GQQjM7jRw2hBWMsHu/view
https://drive.google.com/file/d/1uBsnVoVCqT8VXA-4REClR6FS8AOab2X9/view

@zwangsyc
Copy link
Author

zwangsyc commented Dec 2, 2024

Hi, Yihong, thanks for your sharing. I am running the TED experiment, the main function mentions the

def evaluate_pass_at_k(Occurrence=0, path = '', duplicates = True, top_percent_exclusion = None, dataset_name = 'humaneval'):

if dataset_name == 'humaneval':
    humaneval_data = load_from_disk('datasets/humaneval')['test']#load_dataset("openai_humaneval")['test']

INPUT_PATH = os.path.join(path, f'Occurrence_{Occurrence}.jsonl')
assert os.path.exists(INPUT_PATH)

Might I know where the 'Occurrence_{Occurrence}.jsonl' file is?

@YihongDong
Copy link
Owner

They are included in Google Drive as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants