Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

Merged

Conversation

assertion
Copy link
Contributor

Tell llm, for command that may run indefinitely and not return a result, it should try to run the command in the background.

issue to fix: #1895

@assertion assertion changed the title fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895 fix: codeact bug If running a command that never returns, it gets stuck #1895 May 24, 2024
@assertion assertion changed the title fix: codeact bug If running a command that never returns, it gets stuck #1895 fix: codeact bug [If running a command that never returns, it gets stuck #1895] May 24, 2024
Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be great if you could also mention interactive programs as well:
#1280

@enyst
Copy link
Collaborator

enyst commented May 24, 2024

This seems a bit strange though... I thought the commands are running with this timeout by default, 2 minutes. Configurable via config with this, I believe: https://github.com/OpenDevin/OpenDevin/blob/2d52298a1dcc857ff0d9c9b8c82b3ecb1876ee9b/opendevin/core/config.py#L175

The commands are (supposed to be) killed when it expires. Do you have any idea how did it get to hours?

@SmartManoj
Copy link
Contributor

timeout works on that command too.
image

It seems the OP stuck on that task for 4 hours as the program timed out on each attempt.

@xingyaoww
Copy link
Collaborator

xingyaoww commented May 25, 2024

Maybe we could return a "hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands (e.g., "The command timeout, you should never attempt to run interactive commands, etc") - this might help the model to unblock?

@assertion
Copy link
Contributor Author

@SmartManoj @enyst I missed the last line of issue desc said: It has been stuck on this task for 4 hours

Currently, it will be timeout for CmdRunAction, so only maybe 2 minutes stuck, then CmdRunAction failed.

@assertion
Copy link
Contributor Author

It would be great if you could also mention interactive programs as well: #1280

"hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands

@neubig @xingyaoww I'll try with your suggestion next Monday.

@assertion
Copy link
Contributor Author

I find this pr: #2059 related to the interactive programs. @neubig @SmartManoj Maybe we can let this scene fixed in that pr?

@assertion
Copy link
Contributor Author

Maybe we could return a "hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands (e.g., "The command timeout, you should never attempt to run interactive commands, etc") - this might help the model to unblock?

Modified the prompt and add the CmdRunAction timeout related text. Feel free to give more suggestions, thanks. @xingyaoww

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
@neubig
Copy link
Contributor

neubig commented May 31, 2024

This looks fine to me, but:

  • @assertion : could you take a look at the conflicts and resolve? also, we'll probably need to bump the version.
  • @xingyaoww : we'll have to think about the timing of when to merge this.

@assertion
Copy link
Contributor Author

  • @assertion : could you take a look at the conflicts and resolve? also, we'll probably need to bump the version.

conflicts resolved. @neubig

@li-boxuan
Copy link
Collaborator

(this is not a review)

we'll have to think about the timing of when to merge this.

I spoke to Xingyao and we agree that changes to CodeActAgent, especially prompts, should be put on hold until people finish the current round of evaluation (in a few days).

@assertion
Copy link
Contributor Author

(this is not a review)

we'll have to think about the timing of when to merge this.

I spoke to Xingyao and we agree that changes to CodeActAgent, especially prompts, should be put on hold until people finish the current round of evaluation (in a few days).

ok, we can hold it.

Copy link
Contributor

@neubig neubig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change LGTM, but I feel that we should benchmark accuracy before and after making the change. I'll try to do that in the next few days.

(blocked by #2085)

Copy link
Collaborator

@xingyaoww xingyaoww left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overall LGTM! We had some instability issues with SWE-Bench's eval harness and I will work to improve it in the next two weeks. In the meantime, I think it might be good to merge this as is (my instinct is that it will mostly help benchmark perf than hurting -- in some cases, the agent will want to run interactive command during eval and hurt it).

Once benchmark is fixed, i will run a few comprehensive experiments on the main branch's CodeAct, and can revert these changes if it actually hurts performance.

@neubig neubig enabled auto-merge (squash) June 8, 2024 13:27
Copy link
Collaborator

@yufansong yufansong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Push some code to fix integration test. Can be merge now.

@yufansong yufansong disabled auto-merge June 8, 2024 16:30
@yufansong yufansong enabled auto-merge (squash) June 8, 2024 16:30
@yufansong yufansong merged commit b5a17ef into All-Hands-AI:main Jun 8, 2024
2 checks passed
li-boxuan added a commit that referenced this pull request Jun 9, 2024
* Refactored prompt.py to reduce token usage

* Reverted some destructive changes

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* Apply suggestions from code review

* Apply suggestions from code review

* Update agenthub/codeact_agent/prompt.py

* fix integration test

* make lint

* feat: support ToolQA benchmark (#2263)

* Add files via upload

* Update README.md

* Update run_infer.py

* Update utils.py

* make lint

* Update evaluation/toolqa/run_infer.py

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>

* feat: revert hiden special paths change in file action (#2328)

* revert change in file action

* remove useless code

* make lint

* Support gpqa benchmark evaluation (#2080)

* feat: add gpqa benchmark evaluation

* add metrics

* reset configs in final block

* make lint

---------

Co-authored-by: yufansong <yufan@risingwave-labs.com>

* fix(frontend): prevent API key from resetting after modal change (#2329)

* remove bottom chatbox fade

* Modal wider; fix lint error

* settings: attempt to not clear api key for same provider

* prevent api key from resetting after changing the model

* revert other changes and fix post test tear down error

---------

Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>

* fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034)

* fix: codeact bug #1895

* fix: add CmdRunAction timeout hint.

* Update agenthub/codeact_agent/prompt.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

* regenerate integration test

---------

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>

* Feat: Support Gorilla APIBench  (#2081)

* removed unused files from gorilla

* Update run_infer.py, removed unused imports

* Update utils.py

* Update ast_eval_hf.py

* Update ast_eval_tf.py

* Update ast_eval_th.py

* Create README.md

* Update run_infer.py

* make lint

* Update run_infer.py

* fix lint

---------

Co-authored-by: yufansong <yufan@risingwave-labs.com>

* remote useless (#2332)

* fix integration test

* Update agenthub/codeact_agent/prompt.py

* Update agenthub/codeact_agent/prompt.py

* fix integration test

---------

Co-authored-by: Xingyao Wang <xingyao6@illinois.edu>
Co-authored-by: Frank Xu <frankxu2004@gmail.com>
Co-authored-by: yufansong <yufan@risingwave-labs.com>
Co-authored-by: yueqis <141804823+yueqis@users.noreply.github.com>
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk>
Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com>
Co-authored-by: Jaskirat Singh <1.jaskiratsingh@gmail.com>
Co-authored-by: tobitege <tobitege@gmx.de>
Co-authored-by: amanape <83104063+amanape@users.noreply.github.com>
Co-authored-by: Aaron Xia <zhhuaxia@gmail.com>
Co-authored-by: Graham Neubig <neubig@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: If running a command that never returns, it gets stuck
7 participants