Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Edit linting: use line mapping to rule out irrelevant errors #3649

Closed
wants to merge 12 commits into from

Conversation

li-boxuan
Copy link
Collaborator

@li-boxuan li-boxuan commented Aug 29, 2024

Short description of the problem this fixes or functionality that this introduces. This may be used for the CHANGELOG

Please read #3412 for more detail. To simply put, previous work uses the diff of linting error messages from pre-edit-lint and post-edit-lint to rule out "irrelevant" linting errors. The biggest flaw is that error messages could be generic - leading to both false positives and false negatives. This PR uses a more robust approach: line numbers.


Give a summary of what the PR does, explaining any non-trivial design decisions

Before edit, we run linting once, and record the error messages from the linter. After edit, we run linting once again, and also record the error messages. We then compare the line numbers from both pre-edit-lint and post-edit-lint. We rule out errors that are irrelevant to the edit (i.e. we remove errors that exist in both pre-edit-lint and post-edit-lint). This is non-trivial because line numbers might change. Luckily, since we only allow edits at a single place at a time, it is not too hard to figure out line mapping.


Link of any specific issues this addresses

#3412

li-boxuan added a commit that referenced this pull request Sep 2, 2024
Fix a potential issue that might lead to file corruption when edit linting is enabled

#3124 introduces a feature for editing: running linter twice before and after the change and only extract new errors introduced by the agent. This has some potential issues and I am working on #3649 to address them, but I feel like I am not gonna finish it in the next few days, and that PR has become harder and harder to review, thus this PR, which only focuses on a small improvement.

So what's the issue? When we run linters on the original file before our edits, we need to copy the original file and use a temporary file to lint, because linting may have side-effect (e.g. modifying the file in-place). I used the word "may" because:

Flake8 has no side-effect, so not a problem as of now.
We don't enforce this or document this "no side-effect" as a requirement for linter implementation, so side-effect is allowed.
Regardless, the "after-edit-linting" uses the same approach: backup the file before linting to avoid data corruption. We should keep our "before-edit-linting" consistent.

Why no new unittest that reproduces the issue? Well, as I have mentioned earlier, flake8 has no side-effect, so technically it's not a bug but a flaw. Therefore, there's no way to write a test that reproduces the issue.
@li-boxuan li-boxuan marked this pull request as ready for review September 9, 2024 03:50
@li-boxuan li-boxuan added the agent framework Strategies for prompting, agent, etc label Sep 9, 2024
@tobitege
Copy link
Collaborator

tobitege commented Sep 9, 2024

Great job on this tricky enhancement!
It looks good to me but I'll defer to @xingyaoww for approval as I suppose this will be sent to swe-bench?

@tobitege
Copy link
Collaborator

Great work! Since the tests run fine, I'd say LGTM.
Will run several Aider bench tests on this branch here and see if any related issue comes up -or- these instances come up successful now (compared to earlier tries).

@xingyaoww
Copy link
Contributor

sounds good! I've finally got some of the eval pipeline working properly! Will start an eval on this PR today

@tobitege
Copy link
Collaborator

Just fyi, I did 15 different Aider bench instances and don't see new issues from this PR. 👍

@xingyaoww
Copy link
Contributor

Weird enough, this PR actually brings a lot of degradation.. The current main is about 79 resolved, but this PR gets ~60
image

@tobitege
Copy link
Collaborator

Why do you get those "mv: cannot stat" errors? 🤔

@xingyaoww
Copy link
Contributor

@tobitege those are un-related eval scripts things 🤣

@tobitege
Copy link
Collaborator

tobitege commented Oct 8, 2024

@li-boxuan please buzz us again once this is ready

@li-boxuan li-boxuan marked this pull request as draft October 9, 2024 04:19
@li-boxuan li-boxuan closed this Oct 12, 2024
xingyaoww added a commit that referenced this pull request Oct 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent framework Strategies for prompting, agent, etc eval-this
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants