fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

assertion · 2024-05-24T09:03:40Z

Tell llm, for command that may run indefinitely and not return a result, it should try to run the command in the background.

issue to fix: #1895

neubig

It would be great if you could also mention interactive programs as well:
#1280

enyst · 2024-05-24T11:35:50Z

This seems a bit strange though... I thought the commands are running with this timeout by default, 2 minutes. Configurable via config with this, I believe: https://github.com/OpenDevin/OpenDevin/blob/2d52298a1dcc857ff0d9c9b8c82b3ecb1876ee9b/opendevin/core/config.py#L175

The commands are (supposed to be) killed when it expires. Do you have any idea how did it get to hours?

SmartManoj · 2024-05-25T07:45:43Z

timeout works on that command too.

It seems the OP stuck on that task for 4 hours as the program timed out on each attempt.

xingyaoww · 2024-05-25T12:13:32Z

Maybe we could return a "hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands (e.g., "The command timeout, you should never attempt to run interactive commands, etc") - this might help the model to unblock?

assertion · 2024-05-26T00:45:18Z

@SmartManoj @enyst I missed the last line of issue desc said: It has been stuck on this task for 4 hours

Currently, it will be timeout for CmdRunAction, so only maybe 2 minutes stuck, then CmdRunAction failed.

assertion · 2024-05-26T00:50:05Z

It would be great if you could also mention interactive programs as well: #1280

"hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands

@neubig @xingyaoww I'll try with your suggestion next Monday.

assertion · 2024-05-27T11:52:33Z

I find this pr: #2059 related to the interactive programs. @neubig @SmartManoj Maybe we can let this scene fixed in that pr?

assertion · 2024-05-27T11:54:04Z

Maybe we could return a "hint text" for each timeouted CmdRunAction telling the LLM to be careful about interactive commands (e.g., "The command timeout, you should never attempt to run interactive commands, etc") - this might help the model to unblock?

Modified the prompt and add the CmdRunAction timeout related text. Feel free to give more suggestions, thanks. @xingyaoww

agenthub/codeact_agent/prompt.py

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

neubig · 2024-05-31T19:12:55Z

This looks fine to me, but:

@assertion : could you take a look at the conflicts and resolve? also, we'll probably need to bump the version.
@xingyaoww : we'll have to think about the timing of when to merge this.

assertion · 2024-06-03T03:13:03Z

@assertion : could you take a look at the conflicts and resolve? also, we'll probably need to bump the version.

conflicts resolved. @neubig

li-boxuan · 2024-06-03T03:15:11Z

(this is not a review)

we'll have to think about the timing of when to merge this.

I spoke to Xingyao and we agree that changes to CodeActAgent, especially prompts, should be put on hold until people finish the current round of evaluation (in a few days).

assertion · 2024-06-03T03:30:59Z

(this is not a review)

we'll have to think about the timing of when to merge this.

I spoke to Xingyao and we agree that changes to CodeActAgent, especially prompts, should be put on hold until people finish the current round of evaluation (in a few days).

ok, we can hold it.

neubig

This change LGTM, but I feel that we should benchmark accuracy before and after making the change. I'll try to do that in the next few days.

(blocked by #2085)

xingyaoww

This overall LGTM! We had some instability issues with SWE-Bench's eval harness and I will work to improve it in the next two weeks. In the meantime, I think it might be good to merge this as is (my instinct is that it will mostly help benchmark perf than hurting -- in some cases, the agent will want to run interactive command during eval and hurt it).

Once benchmark is fixed, i will run a few comprehensive experiments on the main branch's CodeAct, and can revert these changes if it actually hurts performance.

yufansong

Push some code to fix integration test. Can be merge now.

* Refactored prompt.py to reduce token usage * Reverted some destructive changes * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Apply suggestions from code review * Apply suggestions from code review * Update agenthub/codeact_agent/prompt.py * fix integration test * make lint * feat: support ToolQA benchmark (#2263) * Add files via upload * Update README.md * Update run_infer.py * Update utils.py * make lint * Update evaluation/toolqa/run_infer.py --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * feat: revert hiden special paths change in file action (#2328) * revert change in file action * remove useless code * make lint * Support gpqa benchmark evaluation (#2080) * feat: add gpqa benchmark evaluation * add metrics * reset configs in final block * make lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * fix(frontend): prevent API key from resetting after modal change (#2329) * remove bottom chatbox fade * Modal wider; fix lint error * settings: attempt to not clear api key for same provider * prevent api key from resetting after changing the model * revert other changes and fix post test tear down error --------- Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> * fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034) * fix: codeact bug #1895 * fix: add CmdRunAction timeout hint. * Update agenthub/codeact_agent/prompt.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * regenerate integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> * Feat: Support Gorilla APIBench (#2081) * removed unused files from gorilla * Update run_infer.py, removed unused imports * Update utils.py * Update ast_eval_hf.py * Update ast_eval_tf.py * Update ast_eval_th.py * Create README.md * Update run_infer.py * make lint * Update run_infer.py * fix lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * remote useless (#2332) * fix integration test * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * fix integration test --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Frank Xu <frankxu2004@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: yueqis <141804823+yueqis@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Jaskirat Singh <1.jaskiratsingh@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> Co-authored-by: Aaron Xia <zhhuaxia@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com>

fix: codeact bug All-Hands-AI#1895

00d5b02

assertion changed the title ~~fix: codeact bug https://github.com/OpenDevin/OpenDevin/issues/1895~~ fix: codeact bug If running a command that never returns, it gets stuck #1895 May 24, 2024

assertion changed the title ~~fix: codeact bug If running a command that never returns, it gets stuck #1895~~ fix: codeact bug [If running a command that never returns, it gets stuck #1895] May 24, 2024

neubig reviewed May 24, 2024

View reviewed changes

neubig assigned assertion May 24, 2024

fix: add CmdRunAction timeout hint.

15478a1

enyst reviewed May 27, 2024

View reviewed changes

agenthub/codeact_agent/prompt.py Outdated Show resolved Hide resolved

Update agenthub/codeact_agent/prompt.py

34d16c3

Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>

SmartManoj mentioned this pull request May 31, 2024

[Bug]: If running a command that never returns, it gets stuck #1895

Closed

2 tasks

neubig assigned xingyaoww May 31, 2024

Merge branch 'main' into fix/codeact-command-indefinitely-bug

715bf82

SmartManoj mentioned this pull request Jun 3, 2024

Process interactive commands and stream output in logs #2059

Closed

neubig self-requested a review June 8, 2024 11:13

neubig assigned neubig and unassigned assertion and xingyaoww Jun 8, 2024

neubig reviewed Jun 8, 2024

View reviewed changes

xingyaoww approved these changes Jun 8, 2024

View reviewed changes

Merge branch 'main' into fix/codeact-command-indefinitely-bug

a516f1e

neubig enabled auto-merge (squash) June 8, 2024 13:27

regenerate integration test

fc6b999

yufansong approved these changes Jun 8, 2024

View reviewed changes

yufansong disabled auto-merge June 8, 2024 16:30

yufansong enabled auto-merge (squash) June 8, 2024 16:30

yufansong merged commit b5a17ef into All-Hands-AI:main Jun 8, 2024
2 checks passed

neubig mentioned this pull request Jun 22, 2024

[Bug]: OpenDevin freezes when an interactive CLI program is called #1280

Closed

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

assertion commented May 24, 2024

neubig left a comment

enyst commented May 24, 2024

SmartManoj commented May 25, 2024

xingyaoww commented May 25, 2024 •

edited

Loading

assertion commented May 26, 2024

assertion commented May 26, 2024

assertion commented May 27, 2024

assertion commented May 27, 2024

neubig commented May 31, 2024

assertion commented Jun 3, 2024

li-boxuan commented Jun 3, 2024

assertion commented Jun 3, 2024

neubig left a comment •

edited

Loading

xingyaoww left a comment •

edited

Loading

yufansong left a comment

fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034

Conversation

assertion commented May 24, 2024

neubig left a comment

Choose a reason for hiding this comment

enyst commented May 24, 2024

SmartManoj commented May 25, 2024

xingyaoww commented May 25, 2024 • edited Loading

assertion commented May 26, 2024

assertion commented May 26, 2024

assertion commented May 27, 2024

assertion commented May 27, 2024

neubig commented May 31, 2024

assertion commented Jun 3, 2024

li-boxuan commented Jun 3, 2024

assertion commented Jun 3, 2024

neubig left a comment • edited Loading

Choose a reason for hiding this comment

xingyaoww left a comment • edited Loading

Choose a reason for hiding this comment

yufansong left a comment

Choose a reason for hiding this comment

xingyaoww commented May 25, 2024 •

edited

Loading

neubig left a comment •

edited

Loading

xingyaoww left a comment •

edited

Loading