-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] support invalid syntax without panics #13778
Comments
#13701 is a useful reference on what changes were needed as of a few days ago to avoid panics on the Python files present in the ruff repo. At least some of those are due to invalid syntax. |
In addition to "not panic", we also want to avoid useless or confusing diagnostics. Inferring useful types in the face of syntax errors is less of a priority. |
We can now run red knot on all invalid-syntax examples in ruffs repository without panicking. We can also run it on a fuzzing corpus with ~4000 invalid-syntax files (#13448). The next step could be to look into direct fuzzing (#14157), or to look for other corpuses (corpi? corpora?) with invalid syntax examples. |
@sharkdp could we extend our corpus tests to run over all ruff tests to ensure we catch regressions early? |
Yes — I'll do that. |
One specific case that we should handle is that we don't put names synthesized by the parser during recovery into the symbol table (or try to look them up). There's an AST method that allows testing for whether the name is valid. |
@sharkdp just to note -- the unique thing about the fuzzer in #14157 is that the randomly generated source-code files it produces are guaranteed to always be valid Python syntax (at least, assuming there isn't a bug in the fuzzer). That's an extremely useful property of the fuzzer in some ways, but it does mean that the fuzzer is somewhat useless for testing how good red-knot is at handling invalid syntax. We'll have to look into other fuzzing libraries if we want to have some automated fuzzing for that as well. |
I've been exploring ways to create panics on source code with syntax errors but I haven't found any that requires any source code that's raising syntax errors which isn't already documented which is #14672 and ruff/crates/red_knot_workspace/tests/check.rs Lines 273 to 277 in b63c2e1
I've also ran the fuzzer which produces source code with invalid syntax (#14678) multiple times and it always finds the one as linked above. The corpus is initiated with most of the Python files in the Ruff code base and all Python files in the CPython code base. The fuzzer runs on them and then starts shuffling the bytes around producing new source code which is then filtered to avoid the ones that doesn't produce syntax errors as per our parser. This has an obvious limitation that the syntax errors that's raised by the compiler I looked at #13778 (comment) as well but I've been unable to create a source code that could panic. I think that might be because of lack of type inference for positions where that could occur like type parameters, function parameters in combination with the AST key that includes the node range. So, technically even though that behavior might be incorrect, it doesn't create a panic in red knot, at least not today. Additionally, I also extracted all the examples from the syntax error test file in CPython code base (https://github.com/python/cpython/blob/bf21e2160d1dc6869fb230b90a23ab030835395b/Lib/test/test_syntax.py) as it's a doctest and ran red knot on top of it and got no panics. There are certain changes that can be done but I've been unable to create a source code related to those changes that would make red knot panic. I'm going to list them down here for posterity:
That said, I think we should periodically run the fuzzer and iteratively fix any errors that the model encounters. The model is still a work in progress so there are missing pieces which might prove necessary to create these panics if any. |
Thanks for writing this up. Do you think it would be possible and useful to have mdtests for invalid identifiers and keywords used as identifiers? |
## Summary Seeing the fuzzing results from @dhruvmanila in #13778, I think we can re-enable these tests. We also had one regression that would have been caught by these tests, so there is some value in having them enabled.
I think they should be covered by the invalid parser tests:
Although, I think they don't have complete coverage especially around "keywords used as identifiers", I'll add them. |
That makes sense, thanks. We could also assert that red knot doesn't generate useless diagnostics for missing identifiers. For example. Red Knot should not emit a 5 + |
## Summary This is related to #13778, more specifically #13778 (comment). This PR adds various test cases where a keyword is being where an identifier is expected. The tests are to make sure that red knot doesn't panic, raises the syntax error and the identifier is added to the symbol table. The final part allows editor related features like renaming the symbol.
The ruff parser is error-resilient and will generate a best-effort AST for any input, but red-knot currently assumes a valid AST in some places, and this can cause panics if it's run over an invalid AST.
We should do best-effort type-checking for invalid ASTs, and never panic.
The trick is to do this while, as much as feasible, still maintaining some useful internal invariants, and failing fast (panic is good in this case!) if those invariants are violated, rather than allowing programming errors to pass silently and result in type-checking bugs.
The text was updated successfully, but these errors were encountered: