-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034
fix: codeact bug [If running a command that never returns, it gets stuck #1895] #2034
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be great if you could also mention interactive programs as well:
#1280
This seems a bit strange though... I thought the commands are running with this timeout by default, 2 minutes. Configurable via config with this, I believe: https://github.com/OpenDevin/OpenDevin/blob/2d52298a1dcc857ff0d9c9b8c82b3ecb1876ee9b/opendevin/core/config.py#L175 The commands are (supposed to be) killed when it expires. Do you have any idea how did it get to hours? |
Maybe we could return a "hint text" for each timeouted |
@SmartManoj @enyst I missed the last line of issue desc said: Currently, it will be timeout for |
@neubig @xingyaoww I'll try with your suggestion next Monday. |
I find this pr: #2059 related to the interactive programs. @neubig @SmartManoj Maybe we can let this scene fixed in that pr? |
Modified the prompt and add the |
Co-authored-by: Engel Nyst <enyst@users.noreply.github.com>
This looks fine to me, but:
|
conflicts resolved. @neubig |
(this is not a review)
I spoke to Xingyao and we agree that changes to CodeActAgent, especially prompts, should be put on hold until people finish the current round of evaluation (in a few days). |
ok, we can hold it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change LGTM, but I feel that we should benchmark accuracy before and after making the change. I'll try to do that in the next few days.
(blocked by #2085)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overall LGTM! We had some instability issues with SWE-Bench's eval harness and I will work to improve it in the next two weeks. In the meantime, I think it might be good to merge this as is (my instinct is that it will mostly help benchmark perf than hurting -- in some cases, the agent will want to run interactive command during eval and hurt it).
Once benchmark is fixed, i will run a few comprehensive experiments on the main branch's CodeAct, and can revert these changes if it actually hurts performance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Push some code to fix integration test. Can be merge now.
* Refactored prompt.py to reduce token usage * Reverted some destructive changes * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * Apply suggestions from code review * Apply suggestions from code review * Update agenthub/codeact_agent/prompt.py * fix integration test * make lint * feat: support ToolQA benchmark (#2263) * Add files via upload * Update README.md * Update run_infer.py * Update utils.py * make lint * Update evaluation/toolqa/run_infer.py --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> * feat: revert hiden special paths change in file action (#2328) * revert change in file action * remove useless code * make lint * Support gpqa benchmark evaluation (#2080) * feat: add gpqa benchmark evaluation * add metrics * reset configs in final block * make lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * fix(frontend): prevent API key from resetting after modal change (#2329) * remove bottom chatbox fade * Modal wider; fix lint error * settings: attempt to not clear api key for same provider * prevent api key from resetting after changing the model * revert other changes and fix post test tear down error --------- Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> * fix: codeact bug [If running a command that never returns, it gets stuck #1895] (#2034) * fix: codeact bug #1895 * fix: add CmdRunAction timeout hint. * Update agenthub/codeact_agent/prompt.py Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> * regenerate integration test --------- Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Graham Neubig <neubig@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> * Feat: Support Gorilla APIBench (#2081) * removed unused files from gorilla * Update run_infer.py, removed unused imports * Update utils.py * Update ast_eval_hf.py * Update ast_eval_tf.py * Update ast_eval_th.py * Create README.md * Update run_infer.py * make lint * Update run_infer.py * fix lint --------- Co-authored-by: yufansong <yufan@risingwave-labs.com> * remote useless (#2332) * fix integration test * Update agenthub/codeact_agent/prompt.py * Update agenthub/codeact_agent/prompt.py * fix integration test --------- Co-authored-by: Xingyao Wang <xingyao6@illinois.edu> Co-authored-by: Frank Xu <frankxu2004@gmail.com> Co-authored-by: yufansong <yufan@risingwave-labs.com> Co-authored-by: yueqis <141804823+yueqis@users.noreply.github.com> Co-authored-by: Engel Nyst <enyst@users.noreply.github.com> Co-authored-by: Boxuan Li <liboxuan@connect.hku.hk> Co-authored-by: Yufan Song <33971064+yufansong@users.noreply.github.com> Co-authored-by: Jaskirat Singh <1.jaskiratsingh@gmail.com> Co-authored-by: tobitege <tobitege@gmx.de> Co-authored-by: amanape <83104063+amanape@users.noreply.github.com> Co-authored-by: Aaron Xia <zhhuaxia@gmail.com> Co-authored-by: Graham Neubig <neubig@gmail.com>
Tell llm, for command that may run indefinitely and not return a result, it should try to run the command in the background.
issue to fix: #1895