-
Notifications
You must be signed in to change notification settings - Fork 5.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create workflow on how to run OpenDevin on SWE-bench evaluation #795
Comments
So basically right now the current SWE-BENCH in the repo is deprecated because it has a lot of issues but we are trying to make a new SWE-BENCH that would be in a separate repo |
if there's any other questions let me know 👌 |
This is really important to be able to evaluate different design decisions and changes that we make to OpenDevin. Hopefully @libowen2121 's work on setting up SWE-bench evaluation, and @mlejva 's implementation within the e2b sandbox will make a dent in this. |
In SWE-bench, generally (or for |
Hey guys, we have a private stabilized version of SWE-bench evaluation pipeline but it is now behind the official SWE-bench repo. We will push the changes to the forked SWE-bench repo and make sure it aligns with the official repo. |
This is (mostly) finished, thanks @libowen2121 and @xingyaoww ! https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/ |
How to run OpenDevin with the SWE-Bench evaluation?
I don't understand how to capture the patch?
Also how to test the patch against the benchmark test suite?
Can we have this documentation?
The text was updated successfully, but these errors were encountered: