Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create workflow on how to run OpenDevin on SWE-bench evaluation #795

Closed
code2graph opened this issue Apr 5, 2024 · 6 comments
Closed

Create workflow on how to run OpenDevin on SWE-bench evaluation #795

code2graph opened this issue Apr 5, 2024 · 6 comments
Assignees
Labels
severity:low Minor issues or affecting single user
Milestone

Comments

@code2graph
Copy link

code2graph commented Apr 5, 2024

How to run OpenDevin with the SWE-Bench evaluation?

I don't understand how to capture the patch?
Also how to test the patch against the benchmark test suite?

Can we have this documentation?

@code2graph code2graph added the enhancement New feature or request label Apr 5, 2024
@rbren rbren added question severity:low Minor issues or affecting single user and removed enhancement New feature or request labels Apr 6, 2024
@dorbanianas
Copy link
Collaborator

So basically right now the current SWE-BENCH in the repo is deprecated because it has a lot of issues but we are trying to make a new SWE-BENCH that would be in a separate repo

@dorbanianas
Copy link
Collaborator

dorbanianas commented Apr 6, 2024

if there's any other questions let me know 👌

@neubig neubig changed the title How to run OpenDevin on the SWE-Bench evaluation? Create workflow on how to run OpenDevin on SWE-bench evaluation Apr 14, 2024
@neubig
Copy link
Contributor

neubig commented Apr 14, 2024

This is really important to be able to evaluate different design decisions and changes that we make to OpenDevin. Hopefully @libowen2121 's work on setting up SWE-bench evaluation, and @mlejva 's implementation within the e2b sandbox will make a dent in this.

@SmartManoj
Copy link
Contributor

In SWE-bench, generally (or for instance_id sympy__sympy-14774), how environment_setup_commit was chosen?

@libowen2121
Copy link
Contributor

Hey guys, we have a private stabilized version of SWE-bench evaluation pipeline but it is now behind the official SWE-bench repo. We will push the changes to the forked SWE-bench repo and make sure it aligns with the official repo.

@neubig
Copy link
Contributor

neubig commented May 9, 2024

This is (mostly) finished, thanks @libowen2121 and @xingyaoww ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/

@neubig neubig closed this as completed May 9, 2024
@github-project-automation github-project-automation bot moved this from In Progress to Done in OpenDevin Priority Roadmap May 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
severity:low Minor issues or affecting single user
Projects
None yet
Development

No branches or pull requests

6 participants