Skip to content

Commit

Permalink
Update instruction for new version of eval runtime-api (#4128)
Browse files Browse the repository at this point in the history
  • Loading branch information
xingyaoww authored Sep 30, 2024
1 parent 71adfee commit 1109637
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 5 deletions.
5 changes: 3 additions & 2 deletions evaluation/swe_bench/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ This is in limited beta. Contact Xingyao over slack if you want to try this out!

```bash
# ./evaluation/swe_bench/scripts/run_infer.sh [model_config] [git-version] [agent] [eval_limit] [max_iter] [num_workers] [dataset] [dataset_split]
ALLHANDS_API_KEY="YOUR-API-KEY" RUNTIME=remote EVAL_DOCKER_IMAGE_PREFIX="us-docker.pkg.dev/evaluation-428620/swe-bench-images" \
ALLHANDS_API_KEY="YOUR-API-KEY" RUNTIME=remote SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" EVAL_DOCKER_IMAGE_PREFIX="us-central1-docker.pkg.dev/evaluation-092424/swe-bench-images" \
./evaluation/swe_bench/scripts/run_infer.sh llm.eval HEAD CodeActAgent 300 30 16 "princeton-nlp/SWE-bench_Lite" test
# This example runs evaluation on CodeActAgent for 300 instances on "princeton-nlp/SWE-bench_Lite"'s test set, with max 30 iteration per instances, with 16 number of workers running in parallel
```
Expand Down Expand Up @@ -163,7 +163,8 @@ This is in limited beta. Contact Xingyao over slack if you want to try this out!
```bash
# ./evaluation/swe_bench/scripts/eval_infer_remote.sh [output.jsonl filepath] [num_workers]
ALLHANDS_API_KEY="YOUR-API-KEY" RUNTIME=remote EVAL_DOCKER_IMAGE_PREFIX="us-docker.pkg.dev/evaluation-428620/swe-bench-images" evaluation/swe_bench/scripts/eval_infer_remote.sh evaluation/outputs/swe_bench_lite/CodeActAgent/Llama-3.1-70B-Instruct-Turbo_maxiter_30_N_v1.9-no-hint/output.jsonl 16 "princeton-nlp/SWE-bench_Lite" "test"
ALLHANDS_API_KEY="YOUR-API-KEY" RUNTIME=remote SANDBOX_REMOTE_RUNTIME_API_URL="https://runtime.eval.all-hands.dev" EVAL_DOCKER_IMAGE_PREFIX="us-central1-docker.pkg.dev/evaluation-092424/swe-bench-images" \
evaluation/swe_bench/scripts/eval_infer_remote.sh evaluation/evaluation_outputs/outputs/swe_bench_lite/CodeActAgent/Llama-3.1-70B-Instruct-Turbo_maxiter_30_N_v1.9-no-hint/output.jsonl 16 "princeton-nlp/SWE-bench_Lite" "test"
# This example evaluate patches generated by CodeActAgent on Llama-3.1-70B-Instruct-Turbo on "princeton-nlp/SWE-bench_Lite"'s test set, with 16 number of workers running in parallel
```
Expand Down
1 change: 1 addition & 0 deletions evaluation/swe_bench/eval_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ def get_config(instance: pd.Series) -> AppConfig:
# large enough timeout, since some testcases take very long to run
timeout=1800,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
),
# do not mount workspace
workspace_base=None,
Expand Down
1 change: 1 addition & 0 deletions evaluation/swe_bench/run_infer.py
Original file line number Diff line number Diff line change
Expand Up @@ -131,6 +131,7 @@ def get_config(
# large enough timeout, since some testcases take very long to run
timeout=300,
api_key=os.environ.get('ALLHANDS_API_KEY', None),
remote_runtime_api_url=os.environ.get('SANDBOX_REMOTE_RUNTIME_API_URL'),
),
# do not mount workspace
workspace_base=None,
Expand Down
6 changes: 3 additions & 3 deletions evaluation/swe_bench/scripts/cleanup_remote_runtime.sh
100644 → 100755
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,10 @@


# API base URL
BASE_URL="https://api.all-hands.dev/v0"
BASE_URL="https://runtime.eval.all-hands.dev"

# Get the list of runtimes
response=$(curl --silent --location --request GET "${BASE_URL}/runtime/list" \
response=$(curl --silent --location --request GET "${BASE_URL}/list" \
--header "X-API-Key: ${ALLHANDS_API_KEY}")

n_runtimes=$(echo $response | jq -r '.total')
Expand All @@ -16,7 +16,7 @@ runtime_ids=$(echo $response | jq -r '.runtimes | .[].runtime_id')
counter=1
for runtime_id in $runtime_ids; do
echo "Stopping runtime ${counter}/${n_runtimes}: ${runtime_id}"
curl --silent --location --request POST "${BASE_URL}/runtime/stop" \
curl --silent --location --request POST "${BASE_URL}/stop" \
--header "X-API-Key: ${ALLHANDS_API_KEY}" \
--header "Content-Type: application/json" \
--data-raw "{\"runtime_id\": \"${runtime_id}\"}"
Expand Down

0 comments on commit 1109637

Please sign in to comment.