You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 496 unevaluated instances...
Base image sweb.base.x86_64:latest already exists, skipping build.
Base images built successfully.
No environment images need to be built.
Found 496 existing instance images. Will reuse them.
Running 496 instances...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 496/496 [04:52<00:00, 1.70it/s]
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 500
Instances submitted: 500
Instances completed: 125
Instances incomplete: 0
Instances resolved: 41
Instances unresolved: 84
Instances with empty patches: 4
Instances with errors: 371
Unstopped containers: 0
Unremoved images: 500
Report written to agentless.agentless_0126.json
(agentless) root@cpu02-2050-SWE-bench:/Agentless#
(agentless) root@cpu02-2050-SWE-bench:/Agentless# python -m swebench.harness.run_evaluation --dataset_name princeton-nlp/SWE-bench_Verified --predictions_path agentless_swebench_verified/all_preds.jsonl --max_workers 500 --run_id agentless
_golden_0126
:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 496 unevaluated instances...
Base image sweb.base.x86_64:latest already exists, skipping build.
Base images built successfully.
No environment images need to be built.
Found 496 existing instance images. Will reuse them.
Running 496 instances...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 496/496 [04:52<00:00, 1.69it/s]
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 500
Instances submitted: 500
Instances completed: 126
Instances incomplete: 0
Instances resolved: 48
Instances unresolved: 78
Instances with empty patches: 4
Instances with errors: 370
Unstopped containers: 0
Unremoved images: 500
Report written to agentless.agentless_golden_0126.json
This is the result from the paper.
How was this 194 (38.80%) calculated?
The text was updated successfully, but these errors were encountered:
The results do not match those in the paper. Below are the results I obtained, followed by the results provided in the official repository.
(agentless) root@cpu02-2050-SWE-bench:~/Agentless# python -m swebench.harness.run_evaluation
--dataset_name princeton-nlp/SWE-bench_Verified
--predictions_path all_preds.jsonl
--max_workers 500
--run_id agentless_0126
:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 496 unevaluated instances...
Base image sweb.base.x86_64:latest already exists, skipping build.
Base images built successfully.
No environment images need to be built.
Found 496 existing instance images. Will reuse them.
Running 496 instances...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 496/496 [04:52<00:00, 1.70it/s]
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 500
Instances submitted: 500
Instances completed: 125
Instances incomplete: 0
Instances resolved: 41
Instances unresolved: 84
Instances with empty patches: 4
Instances with errors: 371
Unstopped containers: 0
Unremoved images: 500
Report written to agentless.agentless_0126.json
(agentless) root@cpu02-2050-SWE-bench:
/Agentless#/Agentless# python -m swebench.harness.run_evaluation --dataset_name princeton-nlp/SWE-bench_Verified --predictions_path agentless_swebench_verified/all_preds.jsonl --max_workers 500 --run_id agentless(agentless) root@cpu02-2050-SWE-bench:
_golden_0126
:128: RuntimeWarning: 'swebench.harness.run_evaluation' found in sys.modules after import of package 'swebench.harness', but prior to execution of 'swebench.harness.run_evaluation'; this may result in unpredictable behaviour
Running 496 unevaluated instances...
Base image sweb.base.x86_64:latest already exists, skipping build.
Base images built successfully.
No environment images need to be built.
Found 496 existing instance images. Will reuse them.
Running 496 instances...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 496/496 [04:52<00:00, 1.69it/s]
All instances run.
Cleaning cached images...
Removed 0 images.
Total instances: 500
Instances submitted: 500
Instances completed: 126
Instances incomplete: 0
Instances resolved: 48
Instances unresolved: 78
Instances with empty patches: 4
Instances with errors: 370
Unstopped containers: 0
Unremoved images: 500
Report written to agentless.agentless_golden_0126.json
This is the result from the paper.
How was this 194 (38.80%) calculated?
The text was updated successfully, but these errors were encountered: