-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove extra period in task BigCodeBench/16 #38
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added a couple questions.
Thank you, @hvaara, for the PR. Replied. |
Still have to test generation and eval. If that looks good I'm ready to merge. |
9c53f8d
to
4799575
Compare
Tested with the following, please verify I did this correctly. # export OPENAI_API_KEY=REDACTED # Set OpenAI API key and uncomment line
git diff fix_v0110.py diff --git a/tools/fix_v0110.py b/tools/fix_v0110.py
index a6aadac..e8004ef 100644
--- a/tools/fix_v0110.py
+++ b/tools/fix_v0110.py
@@ -34,32 +34,20 @@ if __name__ == "__main__":
hard_ds_dict = load_dataset(BIGCODEBENCH_HARD_HF)
ds = ds_dict[BIGCODEBENCH_VERSION]
hard_ds = hard_ds_dict[BIGCODEBENCH_VERSION]
- function_id = [16, 37]
+ function_id = [16]
new_ds = ds.map(map_ds)
new_ds.to_json("BigCodeBench.jsonl")
ds_dict[BIGCODEBENCH_NEW_VERSION] = new_ds
- ds_dict.push_to_hub(BIGCODEBENCH_HF)
+ #ds_dict.push_to_hub(BIGCODEBENCH_HF)
new_hard_ds = hard_ds.map(map_ds)
new_hard_ds.to_json("BigCodeBench-Hard.jsonl")
hard_ds_dict[BIGCODEBENCH_NEW_VERSION] = new_hard_ds
- hard_ds_dict.push_to_hub(BIGCODEBENCH_HARD_HF)
-
+ #hard_ds_dict.push_to_hub(BIGCODEBENCH_HARD_HF)
+
for i in function_id:
old_sample = ds.select([i])
new_sample = new_ds.select([i])
old_sample.to_json("old.jsonl")
new_sample.to_json("new.jsonl")
- api.upload_file(
- path_or_fileobj="old.jsonl",
- path_in_repo=f"{i}/old.jsonl",
- repo_id=BIGCODEBENCH_UPDATE,
- # repo_type="dataset"
- )
- api.upload_file(
- path_or_fileobj="new.jsonl",
- path_in_repo=f"{i}/new.jsonl",
- repo_id=BIGCODEBENCH_UPDATE,
- # repo_type="dataset"
- ) Generate task definition for altered task
Generate output from OpenAI BIGCODEBENCH_OVERRIDE_PATH=$(pwd)/new.jsonl OPENAI_API_KEY=${OPENAI_API_KEY:?} bigcodebench.generate --model gpt-4o-mini --split complete --subset full --resume --backend openai Sanitize the output bigcodebench.sanitize --samples samples.jsonl --calibrate Run the eval (Please note, this is potentially unsafe as it will run the code generated from the LLM on your machine. Use Docker or a throwaway VM in order to mitigate this risk). BIGCODEBENCH_OVERRIDE_PATH=$(pwd)/new.jsonl bigcodebench.evaluate --split complete --samples samples-sanitized-calibrated.jsonl
Verify output from eval cat samples-sanitized-calibrated_hard_eval_results.json {
"date": "2024-09-05 12:40",
"eval": {
"BigCodeBench/16": [
{
"task_id": "BigCodeBench/16",
"solution": "import os\nimport glob\nimport subprocess\ndef task_func(directory, backup_dir='/path/to/backup'):\n\n pass\nimport os\nimport glob\nimport subprocess\ndef task_func(directory, backup_dir='/path/to/backup'):\n \"\"\"\n Backup all '.log' files in a specified directory to a tar.gz file and delete the original files after backup.\n The backup file is named 'logs_backup.tar.gz' and placed in the specified backup directory.\n \n Parameters:\n - directory (str): The directory that contains the log files to be backed up.\n - backup_dir (str, optional): The directory where the backup file will be saved.\n Default is '/path/to/backup'.\n \n Returns:\n - str: The path to the backup file if logs are found, otherwise returns a message 'No logs found to backup'.\n \n Raises:\n - FileNotFoundError: If the specified directory does not exist.\n \"\"\"\n \n # Check if the specified directory exists\n if not os.path.exists(directory):\n raise FileNotFoundError(f\"The specified directory '{directory}' does not exist.\")\n \n # Find all .log files in the specified directory\n log_files = glob.glob(os.path.join(directory, '*.log'))\n \n # Check if there are any log files to backup\n if not log_files:\n return 'No logs found to backup'\n \n # Create the backup directory if it does not exist\n os.makedirs(backup_dir, exist_ok=True)\n \n # Define the backup file path\n backup_file = os.path.join(backup_dir, 'logs_backup.tar.gz')\n \n # Create a tar.gz file containing all the log files\n with subprocess.Popen(['tar', '-czf', backup_file] + log_files, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc:\n stdout, stderr = proc.communicate()\n if proc.returncode != 0:\n raise Exception(f\"Error during backup: {stderr.decode().strip()}\")\n \n # Delete the original log files after successful backup\n for log_file in log_files:\n os.remove(log_file)\n \n return backup_file",
"status": "pass",
"details": {}
}
]
}
} If this LGTY I think we're ready to merge and I can take a look at the other issues as well. |
This removes the extra period from the
test
andcanonical_solution
entries in the taskBigCodeBench/16
.TODO
v0.2.0
for the Python library?Fixes #30
cc @terryyz