Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Runtime] Mega-issue to track all issues related to bash Interactive terminal #3031

Closed
10 tasks
xingyaoww opened this issue Jul 19, 2024 · 14 comments
Closed
10 tasks
Assignees
Labels
bug Something isn't working enhancement New feature or request severity:medium Affecting multiple users tracked Added to internal tracking
Milestone

Comments

@xingyaoww
Copy link
Collaborator

xingyaoww commented Jul 19, 2024

This is a mega-issue tracker for the Interactive terminal issue peoples run into.

Feel free to expand this list if i missed any relevant issue!


Cause

These are typically caused by the same reason: OpenDevin uses pexcept to interact with Bash shells, however, the current parsing logic only looks for the next PS1 prompt (e.g., something like root@hostname:/folderABC $).

This will keep looking for such a pattern until it timeout, causing the following things to break, as listed in the PR above:

  • Open a new interactive program (e.g., python3), where the new prompt changes to >>
  • Open a new text editor (e.g., nano, vim), where the display could be completely broken? (I'm not familiar with the protocol here, though)
  • Enter a new conda virtual environment: conda will prepend the env name (e.g., (base)) before the PS1 prompt, causing the current pexpect parsing to break
  • When the agent is asked for password (e.g., with patterns like Password:)
  • Prompt like (yes/no/[fingerprint]) requesting user confirmation.

Fixes

We plan to resolve them as much as I can once arch refactor #2404 is completed. But these are a non-exhaustive list of patterns we are trying to pexcept and we cannot list everything here:

  1. Try to cover common use cases of these prompts (e.g., [yes/no] pattern, conda environment pattern)
  2. Figure out a more general way (rather than writing rules) for agents to interact with these (e.g., we don't write every rules explicitly, but for example, if we've been waiting for more than 5s and there's no new output from the terminal, it probably means it is waiting for user input and we should and it over to the agent - subsequently, we may need to allow agent to issue special keyboard actions like ctrl+D ctrl+C, etc).
  3. Add something in the prompt that forbids agent goes into interactive programs (e.g., interactive Python, vim, nano, etc)
  4. We need a way to detect if the agent accidentally goes into such an interactive program, and we need a way to force it out (we currently send ctrl+C, which might not work for a large variety of programs like vim).

If you want to help!

Try to take a look at our existing bash parsing logic for the new architecture (under development!):
https://github.com/OpenDevin/OpenDevin/blob/8bfa61f3e4beceb690562b4d105aa01dc50d58d7/opendevin/runtime/client/client.py#L62-L111

You can help to:

  1. Write test cases into https://github.com/OpenDevin/OpenDevin/blob/main/tests/unit/test_runtime.py to expose these interactive bash issues
  2. Try to fix them inside the client/client.py (and/or the ssh_box.py - but we plan to deprecate them soon, so only supporting these on EventStreamRuntime should be sufficient!)
@xingyaoww xingyaoww added bug Something isn't working enhancement New feature or request labels Jul 19, 2024
@xingyaoww xingyaoww self-assigned this Jul 19, 2024
@dosubot dosubot bot added the severity:medium Affecting multiple users label Jul 19, 2024
@tobitege
Copy link
Collaborator

First, great collection of issues to look into!

Fixes
...if we've been waiting for more than 5s...

As to that 2nd point: if the model runs any installations (like with pip), these 5 seconds could become too short?

Add something in the prompt...

We may have to track down the "usual suspects" IF these have parameters especially to suppress interactive mode:
Several package installers have parameters to suppress interactive mode. Here are the options for some of the most common ones (thx to Sonnet):

  1. pip:

    • Use the -q or --quiet flag to suppress output.
    • Use the -y or --yes flag to automatically answer yes to prompts (though pip itself doesn't typically prompt for yes/no).
    pip install -q package_name
  2. Poetry:

    • Poetry does not have interactive prompts during installation, so no specific flag is needed to suppress them.
  3. npm:

    • Use the --yes or -y flag to automatically answer yes to prompts.
    npm install package_name --yes
  4. yarn:

    • Yarn does not typically have interactive prompts during installation, so no specific flag is needed to suppress them.
  5. conda:

    • Use the -y or --yes flag to automatically answer yes to prompts.
    conda install package_name -y
  6. apt-get (for system packages on Debian-based systems):

    • Use the -y or --yes flag to automatically answer yes to prompts.
    sudo apt-get install package_name -y

@xingyaoww
Copy link
Collaborator Author

xingyaoww commented Jul 19, 2024

if the model runs any installations (like with pip), these 5 seconds could become too short?

Yeah.. that's a good point and that's primarily why we did implement the timeout but not this.

I guess the T second timeout should be enforced for the stream of output: e.g., if the terminal keeps printing outputs in the last T second duration (e.g., you see the screen rolling when running installation command), then it is probably working fine without interactions. But if it hangs for T second (no output is printing), then it probably needs someone to look to see if we need to enter/do anything.

I mean, we humans do the same thing! If a command hangs unreasonably long, we might just issue ctrl+C to kill it 😅

Maybe we could (1) switch everything to streaming outputs, (2) just wait for T=10 seconds, (3) if no additional outputs are added during that last T seconds, we are probably stuck - we can let the LLM decide whether it wants to (a) interrupt the process, (b) enter something if this is actually a legit interactive prompt, (c) keep waiting. IMO this might be the next level of agents that is execution time aware 😆

@tobitege
Copy link
Collaborator

The streaming back works in my async PR, but still the prompt regex would then be the issue.

@SmartManoj
Copy link
Contributor

SmartManoj commented Jul 20, 2024

@xingyaoww I think this is the right issue to discuss this.

Will we move to Paramiko for windows support? #2059 (comment)

Why are we using pxssh over Paramiko?

image
Source

Relevant comments:
@iFurySt #1739 (comment)

@xingyaoww
Copy link
Collaborator Author

We will no longer use SSH protocol in the new architecture, though - otherwise we need to maintain two separate connections which could be overly complicated. Instead, we will just interact with a local bash shell (for ease of dependency management), so ssh-related libraries might not be relevant anymore.

@zenitogr
Copy link

#3176 sorry for creating a new issue but I dont see here about long running commands like npm run dev

@zenitogr
Copy link

zenitogr commented Jul 30, 2024

for anyone who want to create a nextjs app here is the noninteractive way:
npx create-next-app my-app-name --ts --app --eslint --import-alias "@/*" --use-npm --tailwind --no-src-dir
you basicly pass all the options you want BUT you must pass all of them with --no-flag if you dont want them or --flag if you want them. I would just create-next-app in my wsl if you are using wsl and call it a day

here are all the flags: https://nextjs.org/docs/pages/api-reference/create-next-app

@zenitogr
Copy link

zenitogr commented Jul 30, 2024

I think thats the number one priority, cause in react land you have a npm run dev running constantly while you dev the project

also running tasks that need interaction is the defacto even if everybody hates it

its just simpler for packages and core components that are so complicated to ask for user options and prefernces and on the fly choices

Yeah, totally agree! Unfortunately it's not super-trivial to fix (otherwise we would have fixed it already), but I agree that this is high priority

you could use two llms

one llm for coordination in a loop of checking things and the other for coding

gpt-pilot kind of does that with agents, like there are different agents that cooperate in level, there is the top agent that gives commands to lower agents and coordinates what the next step and next agent is gonna be

they use different LLMs for each agent

like gpt-4 for top agent with temp 0.8 and gpt-4o for coding with 0 temp

@James4Ever0
Copy link

James4Ever0 commented Jul 31, 2024

#3040 (comment)

@tobitege @xingyaoww @zenitogr @SmartManoj

(PS: This is one of the major features planned on Cybergod)


Update: 8/16/24

Now I have made some progress over the terminal agent. Check out the demo video below:

tmux_show_1

Terminal environment can be captured as image with cursor denoted in red:

vim_edit_tmux_screenshot

@rbren rbren added this to the 2024-09 milestone Aug 16, 2024
SmartManoj added a commit to SmartManoj/Kevin that referenced this issue Aug 17, 2024
@James4Ever0
Copy link

James4Ever0 commented Aug 28, 2024

Now one can obtain terminal input/output statistics in Cybergod like down below:

Tmux statistics

With terminal stats, one can build a more efficient event-driven terminal agent, for example, listen for event TerminalIdle just like NetworkIdle in Playwright, and interval-driven terminal agent can be more intelligent by adding statistics to the prompt, and conditional prompts based on different stats.

Tmux event

More can be learned at here and here.

@mamoodi mamoodi added the tracked Added to internal tracking label Aug 28, 2024
@mamoodi
Copy link
Collaborator

mamoodi commented Dec 5, 2024

We've made a lot of good progress on this but I don't think this is 100% resolved yet.

@xingyaoww
Copy link
Collaborator Author

most of these should be fixed by #4881 🤔

@mamoodi
Copy link
Collaborator

mamoodi commented Jan 6, 2025

@xingyaoww do you think we should close this now that the PR has merged? Or close it when we release?

@xingyaoww
Copy link
Collaborator Author

Let's close this now! Closed by #4881

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working enhancement New feature or request severity:medium Affecting multiple users tracked Added to internal tracking
Projects
None yet
Development

No branches or pull requests

7 participants