Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(experimental) test_runtime_client.py to test _execute_bash() [DO NOT MERGE!] #3040

Closed
wants to merge 8 commits into from

Conversation

tobitege
Copy link
Collaborator

@tobitege tobitege commented Jul 19, 2024

What is the problem that this fixes or functionality that this introduces? Does it fix any open issues?

Enhancements to RuntimeClient's method _execute_bash. Comes with new unit test file.

Belongs to #3031


Give a summary of what the PR does, explaining any non-trivial design decisions

Improved parsing of pexpect output/prompt, with first iteration of interactive prompt detection.


There's one test commented out at the end if someone else wants to try to get it working.

@tobitege tobitege requested a review from xingyaoww July 19, 2024 21:45

# This should NOT match "PS1=\u@\h:\w [PEXPECT]$" when `env` is executed
self.__bash_expect_regex = (
r'\[PEXPECT_BEGIN\] ([a-z0-9_-]*)@([a-zA-Z0-9.-]*):(.+) \[PEXPECT_END\]'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was a reason why i use [a-z0-9_-]*)@([a-zA-Z0-9.-]*):(.+) - because other wise, when agent runs env, it will be stopped by the PS1= env var printed by the env command, causing the parsing to be broken.

Copy link
Collaborator

@xingyaoww xingyaoww Jul 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! I think this got caught in the test cases - i added that one to make sure it is not broken :D

image

@tobitege
Copy link
Collaborator Author

tobitege commented Jul 20, 2024

Some notes:

  • running the original "runtime_build.py" via command line produced an image of 23+ GB(!) in size
  • version in this PR tries to "fix" that (down to < 4GB), but somethings still not right with the setup, I think
  • added tweaks so both Ubuntu 22.04 and 24.04 can get generated (2 packages have different setups: libgl1-mesa-glx and libasound2)

break
elif index == 1: # Matched an interactive prompt
self.shell.sendline('n') # Default to 'no' for safety
output += "(Automatically responded 'n' to prompt)\n"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking if we can just return an CmdRunOutputObservation directly to the agent with this prefix (with the (y/n) added there of course): something like

CmdRunOutputObservation(content="XXXX (y/n)", exit_code=None)
# We can't get exit code here bc the shell is abnormal

And then we can let agent to decide by either running

<execute_bash> y </execute_bash>
OR
<execute_bash> n </execute_bash>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'd be awesome, if that would work!

@tobitege
Copy link
Collaborator Author

Ugh, I messed up a line in the dockerfile generation, doh!

@tobitege tobitege changed the title (test) test_runtime_client.py to test _execute_bash() (experimental) test_runtime_client.py to test _execute_bash() [DO NOT MERGE!] Jul 21, 2024
@tobitege tobitege marked this pull request as draft July 21, 2024 16:50
' python-docx PyPDF2 python-pptx pylatexenc openai \\\n'
' python-dotenv toml termcolor pydantic python-docx pyyaml \\\n'
' docker pexpect tenacity e2b \\\n'
' browsergym minio playwright \n'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think we don't need to install these manually because we already have these in the pyproject.toml - and it will automatically installed into the right version (poetry.lock), right? I guess this is why i like having poetry inside the runtime 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I had copied several over from the plugins code (jupyterlab, browsergym, playwright etc.).
I haven't checked if they're all in the pyproject file, but if so, they'd be "duplicates" without consequence.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I don't think all of the browser stuff is in the pyproject, on 2nd thought.

Copy link
Collaborator

@xingyaoww xingyaoww Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, i think this is the only line that needed to be added, which we already did:
https://github.com/OpenDevin/OpenDevin/blob/a61ac5a21425c21c0d76b213f03d18c80fbb5fad/Makefile#L152-L153

@tobitege tobitege closed this Jul 21, 2024
@@ -61,55 +62,102 @@ def __init__(self, plugins_to_load: list[Plugin], work_dir: str) -> None:

def _init_bash_shell(self, work_dir: str) -> None:
self.shell = pexpect.spawn('/bin/bash', encoding='utf-8', echo=False)
self.__bash_PS1 = r'[PEXPECT_BEGIN] \u@\h:\w [PEXPECT_END]'
Copy link
Collaborator Author

@tobitege tobitege Jul 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@xingyaoww why set the prompt up so complex to begin with? 🤔
Doesn't this make detection harder than it should be?

Hmm... just to avoid PS1= ... :D

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just to avoid PS1= ... :D
It is just the agent could possibly run env to check the env var, then found itself dead in the water because of the PS1 pattern 😓

why set the prompt up so complex to begin with?

Do you mean the user@host:/workdir thing, or the [PEXPECT_BEGIN]? For the former, i think it will be useful for the agent to know whoami and where am i - so i keep those user/host/workdir info. For [PEXPECT_BEGIN] - i spend sometime and i think that's probably the easiest way to make the \u@\h:\w regex work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, user@host. But I see the advantage in extra information for the agent there, which seems the obvious way to do now that you mention it! 👍

@James4Ever0
Copy link

James4Ever0 commented Jul 31, 2024

My thoughts are be like:

  • Use a TTY parsing library like pyte, render the terminal as we perceive
  • Let human pause/stop LLM generation at any point, inject instructions and modify terminal
  • Develop such a sidecar model to learn human interventions, automate the process

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants