Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Agent Comparable to SWE-agent #570

Closed
rbren opened this issue Apr 2, 2024 · 9 comments
Closed

Add Agent Comparable to SWE-agent #570

rbren opened this issue Apr 2, 2024 · 9 comments
Assignees
Labels
enhancement New feature or request severity:critical Critical issues or affecting all users
Milestone

Comments

@rbren
Copy link
Collaborator

rbren commented Apr 2, 2024

These folks seem to be doing really well on the SWE-bench score!

https://swe-agent.com/

@rbren rbren added enhancement New feature or request agent framework labels Apr 2, 2024
@xingyaoww
Copy link
Collaborator

xingyaoww commented Apr 2, 2024

I think a good first step would be:

  • SWE-Agent requires a persistent docker execution(1, 2). Our todo for this would be making our docker sandbox keeps all its past states as discussed here - I am working on a version based on dockerpty but it seems to be pretty tricky to get it working (draft PR Make the Docker Sandbox Stateful (e.g., keep track of all cded directories) #597 - suggestions/PRs are welcomed!)

  • I think the critical component of SWE-agent's success lies in the specific hand-crafted tools (available as commands in bash) like the search_dir, open, and edit tools. A very easy way for us to incorporate these would be: (1) find these tools, and implement as a bash command (this is exactly how SWE-agent implement it!! - check this example), (2) pack these tools into our sandbox docker image, (3) tweak the prompt (like the one here) to let the agent knows these are the additional tools they can use. I think (1) and (2) are not hard to do - we probably only need to tweak our Dockerfile to achieve this. I think (1) and (2) are good first issues, feel free to start PRs!

image

  • Besides these tools, SWE-Bench have pretty sophisticated bash command parsing and block lists. Maybe we can get rid of it to simply our solution at the early stage by directly allowing the agent to interact with the system via bash (similar to CodeActAgent's pipeline - the agent can execute whatever they want, we returns an error message (like this) when it touches on command on the blocklist)

  • Hardest step: Get our SWE-Bench evaluation working and have our agents generate correct patches so that we can test them (we need some logic like this to capture the patch submission and test them properly! @libowen2121 is actively working on the first part (getting evaluation working) -- you can let him know if you are interested in contributing to this part!

Free free to suggest anything I missed!

@guneetsk99
Copy link

@rbren @xingyaoww
Did anyone pick this??

@neubig
Copy link
Contributor

neubig commented Apr 5, 2024

Hey @guneetsk99 , not yet! Please feel free to start taking a look at the substeps and claim ones that you like.

@Sparkier
Copy link
Collaborator

Sparkier commented Apr 5, 2024

Correction @guneetsk99, I've started with the handcrafted tools here.

@guneetsk99
Copy link

guneetsk99 commented Apr 5, 2024

@Sparkier lets connect on slack
Can you DM me. We can collaborate if you ok with it

@JayQuimby
Copy link
Contributor

JayQuimby commented Apr 7, 2024

Made a Draft PR for this: #846

It's a WIP, but it has basic functionality. I am going to work on integrating the commands from #682. Thank you @Sparkier for making those.

If anyone wants to help out feel free.

@rbren rbren added the severity:critical Critical issues or affecting all users label Apr 9, 2024
@neubig neubig changed the title Add SWE-agent Agent Add Agent Comparable to SWE-agent Apr 14, 2024
@neubig neubig added this to the April 2024 milestone Apr 14, 2024
@rbren
Copy link
Collaborator Author

rbren commented Apr 17, 2024

SWE-agent is merged!

Unclear how it compares to the main SWE-agent--will see if we can get some folks to eval it

@neubig
Copy link
Contributor

neubig commented Apr 29, 2024

Hey @rbren, I was looking back at our roadmap, and I don't think this was ready to be closed yet, because we haven't done all of the steps that @xingyaoww mentioned. We still need to test whether our implemented version of the agent gets competitive accuracy on SWE-bench, which we can do after we finish #795

@neubig neubig moved this from Done to In Progress in OpenDevin Priority Roadmap Apr 29, 2024
@neubig
Copy link
Contributor

neubig commented May 9, 2024

This is finished, great job @xingyaoww and @libowen2121 ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

@neubig neubig closed this as completed May 9, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in OpenDevin Priority Roadmap May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request severity:critical Critical issues or affecting all users
Projects
None yet
Development

No branches or pull requests

7 participants