update run_infer and README

All-Hands-AI · Sep 8, 2024 · 39d8d0a · 39d8d0a
1 parent c864237
commit 39d8d0a
Show file tree

Hide file tree

Showing 3 changed files with 14 additions and 11 deletions.
diff --git a/agenthub/coact_agent/executor/system_prompt.j2 b/agenthub/coact_agent/executor/system_prompt.j2
@@ -22,20 +22,20 @@ As a local executor agent, there are some additional actions that you can use to
 
 {% endset %}
 {% set BROWSING_PREFIX %}
-The assistant can browse the Internet with <execute_browse> and </execute_browse>.
+The agent can browse the Internet with <execute_browse> and </execute_browse>.
 For example, <execute_browse> Tell me the usa's president using google search </execute_browse>.
 Or <execute_browse> Tell me what is in http://example.com </execute_browse>.
 {% endset %}
 {% set PIP_INSTALL_PREFIX %}
-The assistant can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them.
+The agent can install Python packages using the %pip magic command in an IPython environment by using the following syntax: <execute_ipython> %pip install [package needed] </execute_ipython> and should always import packages and define variables before starting to use them.
 {% endset %}
 {% set SYSTEM_PREFIX = MINIMAL_SYSTEM_PREFIX + BROWSING_PREFIX + PIP_INSTALL_PREFIX %}
 {% set COMMAND_DOCS %}
-Apart from the standard Python library, the assistant can also use the following functions (already imported) in <execute_ipython> environment:
+Apart from the standard Python library, the agent can also use the following functions (already imported) in <execute_ipython> environment:
 {{ agent_skills_docs }}
 IMPORTANT:
-- `open_file` only returns the first 100 lines of the file by default! The assistant MUST use `scroll_down` repeatedly to read the full file BEFORE making edits!
-- The assistant shall adhere to THE `edit_file_by_replace`, `append_file` and `insert_content_at_line` FUNCTIONS REQUIRING PROPER INDENTATION. If the assistant would like to add the line '        print(x)', it must fully write the line out, with all leading spaces before the code!
+- `open_file` only returns the first 100 lines of the file by default! The agent MUST use `scroll_down` repeatedly to read the full file BEFORE making edits!
+- The agent shall adhere to THE `edit_file_by_replace`, `append_file` and `insert_content_at_line` FUNCTIONS REQUIRING PROPER INDENTATION. If the agent would like to add the line '        print(x)', it must fully write the line out, with all leading spaces before the code!
 - Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
 - Any code issued should be less than 50 lines to avoid context being cut off!
 - After EVERY `create_file` the method `append_file` shall be used to write the FIRST content!
@@ -44,9 +44,9 @@ IMPORTANT:
 {% endset %}
 {% set SYSTEM_SUFFIX %}
 Responses should be concise.
-The assistant should attempt fewer things at a time instead of putting too many commands OR too much code in one "execute" block.
-Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the assistant is finished with the task or needs more input or action from the user in order to proceed.
-If the assistant is finished with the task you MUST include <finish></finish> in your response.
+The agent should attempt fewer things at a time instead of putting too many commands OR too much code in one "execute" block.
+Include ONLY ONE <execute_ipython>, <execute_bash>, or <execute_browse> per response, unless the agent is finished with the task or needs more input or action from the user in order to proceed.
+If the agent is finished with the task you MUST include <finish></finish> in your response.
 IMPORTANT: Execute code using <execute_ipython>, <execute_bash>, or <execute_browse> whenever possible.
 The agent should utilize full file paths and the `pwd` command to prevent path-related errors.
 The agent must avoid apologies and thanks in its responses.

diff --git a/evaluation/swe_bench/README.md b/evaluation/swe_bench/README.md
@@ -179,7 +179,7 @@ Then, in a separate Python environment with `streamlit` library, you can run the
 ```bash
 # Make sure you are inside the cloned `evaluation` repo
 conda activate streamlit # if you follow the optional conda env setup above
-streamlit run 0_📊_OpenHands_Benchmark.py --server.port 8501 --server.address 0.0.0.0
+streamlit run app.py --server.port 8501 --server.address 0.0.0.0
 ```
 
 Then you can access the SWE-Bench trajectory visualizer at `localhost:8501`.

diff --git a/evaluation/swe_bench/run_infer.py b/evaluation/swe_bench/run_infer.py
@@ -39,11 +39,14 @@
 AGENT_CLS_TO_FAKE_USER_RESPONSE_FN = {
     'CodeActAgent': codeact_user_response,
     'CodeActSWEAgent': codeact_user_response,
+    'CoActPlannerAgent': codeact_user_response,
 }
 
+codeact_inst_suffix = 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n'
 AGENT_CLS_TO_INST_SUFFIX = {
-    'CodeActAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n',
-    'CodeActSWEAgent': 'When you think you have fixed the issue through code changes, please run the following command: <execute_bash> exit </execute_bash>.\n',
+    'CodeActAgent': codeact_inst_suffix,
+    'CodeActSWEAgent': codeact_inst_suffix,
+    'CoActPlannerAgent': codeact_inst_suffix,
 }