Add Agent Comparable to SWE-agent #570

rbren · 2024-04-02T11:36:23Z

These folks seem to be doing really well on the SWE-bench score!

https://swe-agent.com/

xingyaoww · 2024-04-02T19:11:50Z

I think a good first step would be:

SWE-Agent requires a persistent docker execution(1, 2). Our todo for this would be making our docker sandbox keeps all its past states as discussed here - I am working on a version based on dockerpty but it seems to be pretty tricky to get it working (draft PR Make the Docker Sandbox Stateful (e.g., keep track of all cded directories) #597 - suggestions/PRs are welcomed!)
I think the critical component of SWE-agent's success lies in the specific hand-crafted tools (available as commands in bash) like the search_dir, open, and edit tools. A very easy way for us to incorporate these would be: (1) find these tools, and implement as a bash command (this is exactly how SWE-agent implement it!! - check this example), (2) pack these tools into our sandbox docker image, (3) tweak the prompt (like the one here) to let the agent knows these are the additional tools they can use. I think (1) and (2) are not hard to do - we probably only need to tweak our Dockerfile to achieve this. I think (1) and (2) are good first issues, feel free to start PRs!

Besides these tools, SWE-Bench have pretty sophisticated bash command parsing and block lists. Maybe we can get rid of it to simply our solution at the early stage by directly allowing the agent to interact with the system via bash (similar to CodeActAgent's pipeline - the agent can execute whatever they want, we returns an error message (like this) when it touches on command on the blocklist)
Hardest step: Get our SWE-Bench evaluation working and have our agents generate correct patches so that we can test them (we need some logic like this to capture the patch submission and test them properly! @libowen2121 is actively working on the first part (getting evaluation working) -- you can let him know if you are interested in contributing to this part!

Free free to suggest anything I missed!

guneetsk99 · 2024-04-05T09:31:53Z

@rbren @xingyaoww
Did anyone pick this??

neubig · 2024-04-05T15:20:42Z

Hey @guneetsk99 , not yet! Please feel free to start taking a look at the substeps and claim ones that you like.

Sparkier · 2024-04-05T16:36:37Z

Correction @guneetsk99, I've started with the handcrafted tools here.

guneetsk99 · 2024-04-05T16:38:47Z

@Sparkier lets connect on slack
Can you DM me. We can collaborate if you ok with it

JayQuimby · 2024-04-07T07:42:19Z

Made a Draft PR for this: #846

It's a WIP, but it has basic functionality. I am going to work on integrating the commands from #682. Thank you @Sparkier for making those.

If anyone wants to help out feel free.

rbren · 2024-04-17T20:14:10Z

SWE-agent is merged!

Unclear how it compares to the main SWE-agent--will see if we can get some folks to eval it

neubig · 2024-04-29T10:33:07Z

Hey @rbren, I was looking back at our roadmap, and I don't think this was ready to be closed yet, because we haven't done all of the steps that @xingyaoww mentioned. We still need to test whether our implemented version of the agent gets competitive accuracy on SWE-bench, which we can do after we finish #795

neubig · 2024-05-09T12:52:50Z

This is finished, great job @xingyaoww and @libowen2121 ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

rbren added enhancement New feature or request agent framework labels Apr 2, 2024

foragerr mentioned this issue Apr 5, 2024

feat: add commands for swebench #682

Merged

rbren mentioned this issue Apr 5, 2024

Redundant project #748

Closed

JayQuimby mentioned this issue Apr 7, 2024

feat: SWE Agent Implementation #846

Merged

5 tasks

rbren added the severity:critical Critical issues or affecting all users label Apr 9, 2024

neubig changed the title ~~Add SWE-agent Agent~~ Add Agent Comparable to SWE-agent Apr 14, 2024

neubig assigned xingyaoww and huybery Apr 14, 2024

neubig added this to OpenDevin Priority Roadmap Apr 14, 2024

neubig added this to the April 2024 milestone Apr 14, 2024

rbren closed this as completed Apr 17, 2024

github-project-automation bot moved this to Done in OpenDevin Priority Roadmap Apr 17, 2024

neubig reopened this Apr 29, 2024

neubig moved this from Done to In Progress in OpenDevin Priority Roadmap Apr 29, 2024

neubig closed this as completed May 9, 2024

github-project-automation bot moved this from In Progress to Done in OpenDevin Priority Roadmap May 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Agent Comparable to SWE-agent #570

Add Agent Comparable to SWE-agent #570

rbren commented Apr 2, 2024

xingyaoww commented Apr 2, 2024 •

edited by neubig

Loading

guneetsk99 commented Apr 5, 2024

neubig commented Apr 5, 2024

Sparkier commented Apr 5, 2024

guneetsk99 commented Apr 5, 2024 •

edited

Loading

JayQuimby commented Apr 7, 2024 •

edited

Loading

rbren commented Apr 17, 2024

neubig commented Apr 29, 2024

neubig commented May 9, 2024

Add Agent Comparable to SWE-agent #570

Add Agent Comparable to SWE-agent #570

Comments

rbren commented Apr 2, 2024

xingyaoww commented Apr 2, 2024 • edited by neubig Loading

guneetsk99 commented Apr 5, 2024

neubig commented Apr 5, 2024

Sparkier commented Apr 5, 2024

guneetsk99 commented Apr 5, 2024 • edited Loading

JayQuimby commented Apr 7, 2024 • edited Loading

rbren commented Apr 17, 2024

neubig commented Apr 29, 2024

neubig commented May 9, 2024

xingyaoww commented Apr 2, 2024 •

edited by neubig

Loading

guneetsk99 commented Apr 5, 2024 •

edited

Loading

JayQuimby commented Apr 7, 2024 •

edited

Loading