Remove extra period in task BigCodeBench/16 #38

hvaara · 2024-09-01T22:11:05Z

This removes the extra period from the test and canonical_solution entries in the task BigCodeBench/16.

TODO

Get initial feedback.
Are we still targeting v0.2.0 for the Python library?
Which version of the dataset on hf.co are we targeting?
Test generation and eval with current solution before submitting.

Fixes #30

cc @terryyz

hvaara

Added a couple questions.

tools/fix_020.py

terryyz · 2024-09-02T10:27:37Z

Thank you, @hvaara, for the PR. Replied.

hvaara · 2024-09-03T21:26:26Z

Still have to test generation and eval. If that looks good I'm ready to merge.

hvaara · 2024-09-05T12:49:15Z

Tested with the following, please verify I did this correctly.

# export OPENAI_API_KEY=REDACTED  # Set OpenAI API key and uncomment line

git diff fix_v0110.py

diff --git a/tools/fix_v0110.py b/tools/fix_v0110.py
index a6aadac..e8004ef 100644
--- a/tools/fix_v0110.py
+++ b/tools/fix_v0110.py
@@ -34,32 +34,20 @@ if __name__ == "__main__":
     hard_ds_dict = load_dataset(BIGCODEBENCH_HARD_HF)
     ds = ds_dict[BIGCODEBENCH_VERSION]
     hard_ds = hard_ds_dict[BIGCODEBENCH_VERSION]
-    function_id = [16, 37]
+    function_id = [16]

     new_ds = ds.map(map_ds)
     new_ds.to_json("BigCodeBench.jsonl")
     ds_dict[BIGCODEBENCH_NEW_VERSION] = new_ds
-    ds_dict.push_to_hub(BIGCODEBENCH_HF)
+    #ds_dict.push_to_hub(BIGCODEBENCH_HF)

     new_hard_ds = hard_ds.map(map_ds)
     new_hard_ds.to_json("BigCodeBench-Hard.jsonl")
     hard_ds_dict[BIGCODEBENCH_NEW_VERSION] = new_hard_ds
-    hard_ds_dict.push_to_hub(BIGCODEBENCH_HARD_HF)
-
+    #hard_ds_dict.push_to_hub(BIGCODEBENCH_HARD_HF)
+
     for i in function_id:
         old_sample = ds.select([i])
         new_sample = new_ds.select([i])
         old_sample.to_json("old.jsonl")
         new_sample.to_json("new.jsonl")
-        api.upload_file(
-            path_or_fileobj="old.jsonl",
-            path_in_repo=f"{i}/old.jsonl",
-            repo_id=BIGCODEBENCH_UPDATE,
-            # repo_type="dataset"
-        )
-        api.upload_file(
-            path_or_fileobj="new.jsonl",
-            path_in_repo=f"{i}/new.jsonl",
-            repo_id=BIGCODEBENCH_UPDATE,
-            # repo_type="dataset"
-        )

Generate task definition for altered task

python fix_v0110.py

Generate output from OpenAI

BIGCODEBENCH_OVERRIDE_PATH=$(pwd)/new.jsonl OPENAI_API_KEY=${OPENAI_API_KEY:?} bigcodebench.generate --model gpt-4o-mini --split complete --subset full --resume --backend openai

Sanitize the output

bigcodebench.sanitize --samples samples.jsonl --calibrate

Run the eval (Please note, this is potentially unsafe as it will run the code generated from the LLM on your machine. Use Docker or a throwaway VM in order to mitigate this risk).

BIGCODEBENCH_OVERRIDE_PATH=$(pwd)/new.jsonl bigcodebench.evaluate --split complete --samples samples-sanitized-calibrated.jsonl

Load from ground-truth from /home/hvaara/.cache/bigcodebench/b2db11e110a0b35b31132e278af5d7cc.pkl
Reading samples...
1it [00:00, 21.58it/s]
100%|███████████████████████████████████████████████████████████████████████| 1/1 [00:02<00:00,  2.81s/it]
BigCodeBench-Complete-calibrated (Hard)
Groundtruth pass rate: 1.000
pass@1: 1.000

Verify output from eval

cat samples-sanitized-calibrated_hard_eval_results.json

{
  "date": "2024-09-05 12:40",
  "eval": {
    "BigCodeBench/16": [
      {
        "task_id": "BigCodeBench/16",
        "solution": "import os\nimport glob\nimport subprocess\ndef task_func(directory, backup_dir='/path/to/backup'):\n\n    pass\nimport os\nimport glob\nimport subprocess\ndef task_func(directory, backup_dir='/path/to/backup'):\n    \"\"\"\n    Backup all '.log' files in a specified directory to a tar.gz file and delete the original files after backup.\n    The backup file is named 'logs_backup.tar.gz' and placed in the specified backup directory.\n    \n    Parameters:\n    - directory (str): The directory that contains the log files to be backed up.\n    - backup_dir (str, optional): The directory where the backup file will be saved.\n                                  Default is '/path/to/backup'.\n    \n    Returns:\n    - str: The path to the backup file if logs are found, otherwise returns a message 'No logs found to backup'.\n    \n    Raises:\n    - FileNotFoundError: If the specified directory does not exist.\n    \"\"\"\n    \n    # Check if the specified directory exists\n    if not os.path.exists(directory):\n        raise FileNotFoundError(f\"The specified directory '{directory}' does not exist.\")\n    \n    # Find all .log files in the specified directory\n    log_files = glob.glob(os.path.join(directory, '*.log'))\n    \n    # Check if there are any log files to backup\n    if not log_files:\n        return 'No logs found to backup'\n    \n    # Create the backup directory if it does not exist\n    os.makedirs(backup_dir, exist_ok=True)\n    \n    # Define the backup file path\n    backup_file = os.path.join(backup_dir, 'logs_backup.tar.gz')\n    \n    # Create a tar.gz file containing all the log files\n    with subprocess.Popen(['tar', '-czf', backup_file] + log_files, stdout=subprocess.PIPE, stderr=subprocess.PIPE) as proc:\n        stdout, stderr = proc.communicate()\n        if proc.returncode != 0:\n            raise Exception(f\"Error during backup: {stderr.decode().strip()}\")\n    \n    # Delete the original log files after successful backup\n    for log_file in log_files:\n        os.remove(log_file)\n    \n    return backup_file",
        "status": "pass",
        "details": {}
      }
    ]
  }
}

If this LGTY I think we're ready to merge and I can take a look at the other issues as well.

Remove extra period in task BigCodeBench/16

db8200e

hvaara commented Sep 1, 2024

View reviewed changes

tools/fix_020.py Outdated Show resolved Hide resolved

tools/fix_020.py Outdated Show resolved Hide resolved

hvaara mentioned this pull request Sep 3, 2024

[docs] Add working end-to-end example for a local model #40

Open

Target v0.1.10 for task BigCodeBench/16

4799575

hvaara force-pushed the bcb16-period-fix branch from 9c53f8d to 4799575 Compare September 4, 2024 00:11

hvaara requested a review from terryyz September 5, 2024 12:50

terryyz merged commit b0676a7 into bigcode-project:main Sep 10, 2024

hvaara deleted the bcb16-period-fix branch September 10, 2024 20:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove extra period in task BigCodeBench/16 #38

Remove extra period in task BigCodeBench/16 #38

hvaara commented Sep 1, 2024 •

edited

Loading

hvaara left a comment

terryyz commented Sep 2, 2024

hvaara commented Sep 3, 2024

hvaara commented Sep 5, 2024

Remove extra period in task BigCodeBench/16 #38

Remove extra period in task BigCodeBench/16 #38

Conversation

hvaara commented Sep 1, 2024 • edited Loading

TODO

hvaara left a comment

Choose a reason for hiding this comment

terryyz commented Sep 2, 2024

hvaara commented Sep 3, 2024

hvaara commented Sep 5, 2024

hvaara commented Sep 1, 2024 •

edited

Loading