Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] Improve the unresolved-import check #13007

Merged
merged 3 commits into from
Aug 21, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
100 changes: 90 additions & 10 deletions crates/red_knot_python_semantic/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -222,6 +222,19 @@ impl<'db> Type<'db> {
}
}

/// Resolve a member access of a type.
///
/// For example, if `foo` is `Type::Instance(<Bar>)`,
/// `foo.member(&db, "baz")` returns the type of `baz` attributes
/// as accessed from instances of the `Bar` class.
///
/// TODO: use of this method currently requires manually checking
/// whether the returned type is `Unknown`/`Unbound`
/// (or a union with `Unknown`/`Unbound`) in many places.
/// Ideally we'd use a more type-safe pattern, such as returning
/// an `Option` or a `Result` from this method, which would force
/// us to explicitly consider whether to handle an error or propagate
/// it up the call stack.
#[must_use]
pub fn member(&self, db: &'db dyn Db, name: &Name) -> Type<'db> {
match self {
Expand Down Expand Up @@ -369,12 +382,13 @@ mod tests {
use crate::db::tests::TestDb;
use crate::{Program, ProgramSettings, PythonVersion, SearchPathSettings};

#[test]
fn check_types() -> anyhow::Result<()> {
let mut db = TestDb::new();
use super::TypeCheckDiagnostics;

db.write_file("src/foo.py", "import bar\n")
.context("Failed to write foo.py")?;
fn setup_db() -> TestDb {
let db = TestDb::new();
db.memory_file_system()
.create_directory_all("/src")
.unwrap();

Program::from_settings(
&db,
Expand All @@ -390,16 +404,82 @@ mod tests {
)
.expect("Valid search path settings");

db
}

fn assert_diagnostic_messages(diagnostics: &TypeCheckDiagnostics, expected: &[&str]) {
let messages: Vec<&str> = diagnostics
.iter()
.map(|diagnostic| diagnostic.message())
.collect();
assert_eq!(&messages, expected);
}

#[test]
fn unresolved_import_statement() -> anyhow::Result<()> {
let mut db = setup_db();

db.write_file("src/foo.py", "import bar\n")
.context("Failed to write foo.py")?;

let foo = system_path_to_file(&db, "src/foo.py").context("Failed to resolve foo.py")?;

let diagnostics = super::check_types(&db, foo);
assert_diagnostic_messages(&diagnostics, &["Import 'bar' could not be resolved."]);

Ok(())
}

#[test]
fn unresolved_import_from_statement() {
let mut db = setup_db();

db.write_file("src/foo.py", "from bar import baz\n")
.unwrap();
let foo = system_path_to_file(&db, "src/foo.py").unwrap();
let diagnostics = super::check_types(&db, foo);
assert_diagnostic_messages(&diagnostics, &["Import 'bar' could not be resolved."]);
carljm marked this conversation as resolved.
Show resolved Hide resolved
}

assert_eq!(diagnostics.len(), 1);
assert_eq!(
diagnostics[0].message(),
"Import 'bar' could not be resolved."
#[test]
fn unresolved_import_from_resolved_module() {
let mut db = setup_db();

db.write_files([("/src/a.py", ""), ("/src/b.py", "from a import thing")])
.unwrap();

let b_file = system_path_to_file(&db, "/src/b.py").unwrap();
let b_file_diagnostics = super::check_types(&db, b_file);
assert_diagnostic_messages(
&b_file_diagnostics,
&["Could not resolve import of 'thing' from 'a'"],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I think describing it as a failed attribute access would actually be much clearer than the vague "Could not resolve" which doesn't really communicate anything! This message also doesn't currently clarify that 'a' is a module (though it could be taken as implied.) I actually can't think of an error message here that I think would be better than "Module a has no attribute thing."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do people really think of from collections import deque as an "attribute access" of deque from the collections module? I certainly don't. Of course that's what's happening under the hood, but even at runtime this detail is hidden -- the import fails with ImportError rather than AttributeError, and there's no mention of attempting to access an attribute:

% python -c "from collections import foo"                
Traceback (most recent call last):
  File "<string>", line 1, in <module>
ImportError: cannot import name 'foo' from 'collections' (/Users/alexw/.pyenv/versions/3.12.4/lib/python3.12/collections/__init__.py)

);
}

Ok(())
#[ignore = "\
A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
despite the symbol existing in the symbol table for `a.py`"]
Comment on lines +460 to +461
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think when we issue the error on the import in a.py, we should set the type of foo in a to Unknown, not Unbound, which should avoid this?

#[test]
fn resolved_import_of_symbol_from_unresolved_import() {
let mut db = setup_db();

db.write_files([
("/src/a.py", "import foo as foo"),
("/src/b.py", "from a import foo"),
])
.unwrap();

let a_file = system_path_to_file(&db, "/src/a.py").unwrap();
let a_file_diagnostics = super::check_types(&db, a_file);
assert_diagnostic_messages(
&a_file_diagnostics,
&["Import 'foo' could not be resolved."],
);

// Importing the unresolved import into a second first-party file should not trigger
// an additional "unresolved import" violation
let b_file = system_path_to_file(&db, "/src/b.py").unwrap();
let b_file_diagnostics = super::check_types(&db, b_file);
assert_eq!(&*b_file_diagnostics, &[]);
}
}
113 changes: 82 additions & 31 deletions crates/red_knot_python_semantic/src/types/infer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -866,7 +866,26 @@ impl<'db> TypeInferenceBuilder<'db> {
asname: _,
} = alias;

let module_ty = self.module_ty_from_name(ModuleName::new(name), alias.into());
let module_ty = ModuleName::new(name)
.ok_or(ModuleResolutionError::InvalidSyntax)
.and_then(|module_name| self.module_ty_from_name(module_name));

let module_ty = match module_ty {
Ok(ty) => ty,
Err(ModuleResolutionError::InvalidSyntax) => {
tracing::debug!("Failed to resolve import due to invalid syntax");
Type::Unknown
}
Err(ModuleResolutionError::UnresolvedModule) => {
self.add_diagnostic(
AnyNodeRef::Alias(alias),
"unresolved-import",
format_args!("Import '{name}' could not be resolved."),
);
Type::Unknown
}
};

self.types.definitions.insert(definition, module_ty);
}

Expand Down Expand Up @@ -914,32 +933,38 @@ impl<'db> TypeInferenceBuilder<'db> {
/// - `tail` is the relative module name stripped of all leading dots:
/// - `from .foo import bar` => `tail == "foo"`
/// - `from ..foo.bar import baz` => `tail == "foo.bar"`
fn relative_module_name(&self, tail: Option<&str>, level: NonZeroU32) -> Option<ModuleName> {
fn relative_module_name(
&self,
tail: Option<&str>,
level: NonZeroU32,
) -> Result<ModuleName, ModuleResolutionError> {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just emit the diagnostics inside this function, return an Option, and avoid the need for ModuleResolutionError?

let Some(module) = file_to_module(self.db, self.file) else {
tracing::debug!(
"Relative module resolution '{}' failed; could not resolve file '{}' to a module",
format_import_from_module(level.get(), tail),
self.file.path(self.db)
);
return None;
return Err(ModuleResolutionError::UnresolvedModule);
};
let mut level = level.get();
if module.kind().is_package() {
level -= 1;
}
let mut module_name = module.name().to_owned();
for _ in 0..level {
module_name = module_name.parent()?;
module_name = module_name
.parent()
.ok_or(ModuleResolutionError::UnresolvedModule)?;
}
if let Some(tail) = tail {
if let Some(valid_tail) = ModuleName::new(tail) {
module_name.extend(&valid_tail);
} else {
tracing::debug!("Relative module resolution failed: invalid syntax");
return None;
return Err(ModuleResolutionError::InvalidSyntax);
}
}
Some(module_name)
Ok(module_name)
}

fn infer_import_from_definition(
Expand Down Expand Up @@ -974,12 +999,12 @@ impl<'db> TypeInferenceBuilder<'db> {
alias.name,
format_import_from_module(*level, module),
);
let module_name =
module.expect("Non-relative import should always have a non-None `module`!");
ModuleName::new(module_name)
module
.and_then(ModuleName::new)
.ok_or(ModuleResolutionError::InvalidSyntax)
};

let module_ty = self.module_ty_from_name(module_name, import_from.into());
let module_ty = module_name.and_then(|module_name| self.module_ty_from_name(module_name));

let ast::Alias {
range: _,
Expand All @@ -992,11 +1017,34 @@ impl<'db> TypeInferenceBuilder<'db> {
// the runtime error will occur immediately (rather than when the symbol is *used*,
// as would be the case for a symbol with type `Unbound`), so it's appropriate to
// think of the type of the imported symbol as `Unknown` rather than `Unbound`
let ty = module_ty
let member_ty = module_ty
.unwrap_or(Type::Unbound)
.member(self.db, &Name::new(&name.id))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still find it odd that we have to do the "unbound" check everywhere where we call .member. Do we not always want to emit a diagnostic when failing to resolve a member?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do, but it will need to be different error codes. It would be very counterintuitive for users if import foo triggers an unresolved-import error code and from foo import bar triggers an unresolved-member error code. Users will expect all import-related failures to be the same error code and all attribute-access-related failures to be a different error code.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. We might just want to parameterize the member method. But maybe that's not worth it? I don't know. Have you looked at how mypy/pyright do this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might just want to parameterize the member method.

Not sure exactly what you mean; could you clarify?

Have you looked at how mypy/pyright do this?

Mypy has a very different architecture to the one we're building. It resolves all imports eagerly in a first pass before it does any other type checking; module-not-found errors are emitted during this first pass here: https://github.com/python/mypy/blob/fe15ee69b9225f808f8ed735671b73c31ae1bed8/mypy/build.py#L2600-L2700.

I'm not as familiar with pyright's codebase but I can dig in now

Copy link
Member Author

@AlexWaygood AlexWaygood Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... it seems both mypy and pyright in fact do what you'd prefer here -- they both use "attribute-access" error codes for from x import y imports:

Pyright's error code is reportAttributeAccessIssue; mypy's is attr-defined, which both seem equally bad. In terms of error messages, pyright definitely wins, though: pyright has:

"foo" is unknown import symbol

whereas mypy has

Module "collections.abc" has no attribute "foo" 

I'm somewhat surprised by this. But given this precedent, I'm okay with emitting the diagnostic from .member(), if you'd prefer. Though I hope we still aspire to be more user-friendly in our error messages than either mypy or pyright.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I agree that that makes a lot of sense! I can make that change

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... if the inferred type of a member access is a union, e.g. int | Unbound, would you expect that to be represented as an Ok() or an Err() variant (if we return an Result from .member()?

Copy link
Member Author

@AlexWaygood AlexWaygood Aug 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having a closure or similar to create a diagnostic if the member is unbound.

This might be doable, but after looking at it for a little bit I think it would still be quite a large refactor and design change. I think it's out of the scope of this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced there is any problem with reporting an error on from foo import bar (where the module foo does exist, but has no top-level symbol bar) as an attribute-access error code. If I had to pick between the pyright or mypy error messages above, I would prefer the mypy one (the pyright one gives less useful context). The fact that the attribute access occurred as part of a from-import will be clear from the error location; it doesn't have to be part of the error message or code. I wouldn't want to make design decisions based on the hypothesis that this is confusing to users without clear evidence that it is (e.g. a history of users filing issues on mypy about that error code or message); personally I find it intuitive.

Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should return an Option or Result from member(), for the reason @AlexWaygood mentions above: we can have a possibly-unbound name (currently represented as a union with Unbound), and it's not clear how that should look in a binary representation.

We could design our own representation for "Type plus definitely-bound/maybe-unbound/definitely-unbound state" and use that instead of Type::Unbound. This will complicate some APIs and I think be somewhat less efficient, but it would have the advantage that it would force handling Unbound early, which I think is desirable. In general I don't think we want Unbound escaping from the scope where it originated; it should instead result in a diagnostic and be eliminated from the type (replaced with Unknown if we have no other type information; probably just disappear from the type if we do.) I do think this alternative to Type::Unbound is worth exploring and seeing what kind of impact it will have.

I was thinking that we would push diagnostics inside member() to simplify the handling of Unbound, and I think I still favor that approach. If we do want to make distinctions in error codes/messages based on context from the caller of member(), I think we can always pass in context to make that possible.

.replace_unbound_with(self.db, Type::Unknown);

self.types.definitions.insert(definition, ty);
if matches!(module_ty, Err(ModuleResolutionError::UnresolvedModule)) {
self.add_diagnostic(
AnyNodeRef::StmtImportFrom(import_from),
"unresolved-import",
format_args!(
"Import '{}{}' could not be resolved.",
".".repeat(*level as usize),
module.unwrap_or_default()
),
);
} else if module_ty.is_ok() && member_ty.is_unknown() {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should be emitting a diagnostic at all if the imported symbol is of type Unknown. Unknown might result from an error in the other module, but in that case the diagnostic should occur in the other module; importing a value of unknown type is not an error.

I think just removing this clause would fix the ignored test; what would it break?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the only reason we have to check for Unknown here is that we intentionally replaced Unbound with Unknown just a few lines above. If we wait to do it after, the bug in the ignored test goes away, and all tests pass:

diff --git a/crates/red_knot_python_semantic/src/types.rs b/crates/red_knot_python_semantic/src/types.rs
index 173c957d1..c4ff40220 100644
--- a/crates/red_knot_python_semantic/src/types.rs
+++ b/crates/red_knot_python_semantic/src/types.rs
@@ -451,9 +451,6 @@ mod tests {
         );
     }

-    #[ignore = "\
-A spurious second 'Unresolved import' diagnostic message is emitted on `b.py`, \
-despite the symbol existing in the symbol table for `a.py`"]
     #[test]
     fn resolved_import_of_symbol_from_unresolved_import() {
         let mut db = setup_db();
diff --git a/crates/red_knot_python_semantic/src/types/infer.rs b/crates/red_knot_python_semantic/src/types/infer.rs
index 9b183727e..28dc8f2e3 100644
--- a/crates/red_knot_python_semantic/src/types/infer.rs
+++ b/crates/red_knot_python_semantic/src/types/infer.rs
@@ -1012,15 +1012,9 @@ impl<'db> TypeInferenceBuilder<'db> {
             asname: _,
         } = alias;

-        // If a symbol is unbound in the module the symbol was originally defined in,
-        // when we're trying to import the symbol from that module into "our" module,
-        // the runtime error will occur immediately (rather than when the symbol is *used*,
-        // as would be the case for a symbol with type `Unbound`), so it's appropriate to
-        // think of the type of the imported symbol as `Unknown` rather than `Unbound`
         let member_ty = module_ty
             .unwrap_or(Type::Unbound)
-            .member(self.db, &Name::new(&name.id))
-            .replace_unbound_with(self.db, Type::Unknown);
+            .member(self.db, &Name::new(&name.id));

         if matches!(module_ty, Err(ModuleResolutionError::UnresolvedModule)) {
             self.add_diagnostic(
@@ -1032,7 +1026,7 @@ impl<'db> TypeInferenceBuilder<'db> {
                     module.unwrap_or_default()
                 ),
             );
-        } else if module_ty.is_ok() && member_ty.is_unknown() {
+        } else if module_ty.is_ok() && member_ty.is_unbound() {
             self.add_diagnostic(
                 AnyNodeRef::Alias(alias),
                 "unresolved-import",
@@ -1044,6 +1038,8 @@ impl<'db> TypeInferenceBuilder<'db> {
             );
         }

+        let member_ty = member_ty.replace_unbound_with(self.db, Type::Unknown);
+
         self.types.definitions.insert(definition, member_ty);
     }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I think you're right that this fixes the tests I added (and skipped) in this PR. Here's an interesting edge case that you might not have considered, though. What should we do for something like this?

# foo.py
for x in range(0):
    pass

# bar.py
from foo import x

Running python bar.py fails with ImportError at runtime because x is not defined in foo.py (due to the iterable being empty), but neither mypy nor pyright catches this. Arguably we should infer the type of x after the for loop has completed as being int | Unbound -- should we flag any union that includes Unbound as one of the elements with an unresolved-import diagnostic? Probably not, I don't think -- the import might succeed, it just also might not. So let's say (for argument's sake) we use the error code import-possibly-unbound for that.

So then what if we have something like this, where the symbol definitely exists in foo.py, but is also definitely unbound? (Mypy and pyright do both catch this one.) Should that have the error code unresolved-import, or import-possibly-unbound?

# foo.py
x: int

# bar.py
from foo import x

These are somewhat obscure edge cases, so I think what you propose here is a strict improvement on what I implemented in this PR; I'll make a fixup PR to address your suggestions. But my point is that we may still at some point need some context propagated regarding the cause of the error -- whether that error is represented as Unknown or Unbound.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, great points and great questions! I had the x: int case in mind last night when thinking again about representation of unbound-ness in a different comment on this PR.

I agree that maybe-unbound is a different error class than definitely-unbound (and arguable whether maybe-unbound should trigger an error on import at all.)

I think we do want a representation of "definitely unbound and typed" that we use for imports, so e.g. we can error if you import from x: int (as long as it's not a stub), but we can still type check the importing module treating x as int rather than Unknown. (Not sure how important this is, but it seems marginally better.) I think this can be handled by declared vs inferred types.

What's less clear to me is if we need a representation of "definitely unbound and typed" for inferred types. It would mean that within a scope we could also check code following x: int and emit diagnostics both for "x is not bound" and for use of x that isn't valid for an int. Maybe that's useful?

Like I mentioned above in a different comment, I'm pretty open to exploring other representations of unbound to make it orthogonal to types; the main question for me is if we can avoid it regressing in perf, and if not, do we gain enough from it to be worth the regression?

self.add_diagnostic(
AnyNodeRef::Alias(alias),
"unresolved-import",
format_args!(
"Could not resolve import of '{name}' from '{}{}'",
".".repeat(*level as usize),
module.unwrap_or_default()
),
);
}

self.types.definitions.insert(definition, member_ty);
}

fn infer_return_statement(&mut self, ret: &ast::StmtReturn) {
Expand All @@ -1011,25 +1059,12 @@ impl<'db> TypeInferenceBuilder<'db> {
}

fn module_ty_from_name(
&mut self,
module_name: Option<ModuleName>,
node: AnyNodeRef,
) -> Type<'db> {
let Some(module_name) = module_name else {
return Type::Unknown;
};

if let Some(module) = resolve_module(self.db, module_name.clone()) {
Type::Module(module.file())
} else {
self.add_diagnostic(
node,
"unresolved-import",
format_args!("Import '{module_name}' could not be resolved."),
);

Type::Unknown
}
&self,
module_name: ModuleName,
) -> Result<Type<'db>, ModuleResolutionError> {
Copy link
Contributor

@carljm carljm Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the advantage of returning a Result vs just emitting the diagnostic directly in this method, as the previous code did.

resolve_module(self.db, module_name)
.map(|module| Type::Module(module.file()))
.ok_or(ModuleResolutionError::UnresolvedModule)
}

fn infer_decorator(&mut self, decorator: &ast::Decorator) -> Type<'db> {
Expand Down Expand Up @@ -1795,6 +1830,12 @@ fn format_import_from_module(level: u32, module: Option<&str>) -> String {
)
}

#[derive(Debug, Copy, Clone, PartialEq, Eq)]
enum ModuleResolutionError {
InvalidSyntax,
UnresolvedModule,
}

#[cfg(test)]
mod tests {
use anyhow::Context;
Expand Down Expand Up @@ -2048,6 +2089,16 @@ mod tests {
Ok(())
}

#[test]
fn from_import_with_no_module_name() -> anyhow::Result<()> {
// This test checks that invalid syntax in a `StmtImportFrom` node
// leads to the type being inferred as `Unknown`
let mut db = setup_db();
db.write_file("src/foo.py", "from import bar")?;
assert_public_ty(&db, "src/foo.py", "bar", "Unknown");
Ok(())
}

#[test]
fn resolve_base_class_by_name() -> anyhow::Result<()> {
let mut db = setup_db();
Expand Down
12 changes: 12 additions & 0 deletions crates/ruff_benchmark/benches/red_knot.rs
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,19 @@ struct Case {
}

const TOMLLIB_312_URL: &str = "https://raw.githubusercontent.com/python/cpython/8e8a4baf652f6e1cee7acde9d78c4b6154539748/Lib/tomllib";

// This first "unresolved import" is because we don't understand `*` imports yet.
// The following "unresolved import" violations are because we can't distinguish currently from
// "Symbol exists in the module but its type is unknown" and
// "Symbol does not exist in the module"
static EXPECTED_DIAGNOSTICS: &[&str] = &[
"/src/tomllib/_parser.py:7:29: Could not resolve import of 'Iterable' from 'collections.abc'",
"/src/tomllib/_parser.py:10:20: Could not resolve import of 'Any' from 'typing'",
"/src/tomllib/_parser.py:13:5: Could not resolve import of 'RE_DATETIME' from '._re'",
"/src/tomllib/_parser.py:14:5: Could not resolve import of 'RE_LOCALTIME' from '._re'",
"/src/tomllib/_parser.py:15:5: Could not resolve import of 'RE_NUMBER' from '._re'",
"/src/tomllib/_parser.py:20:21: Could not resolve import of 'Key' from '._types'",
"/src/tomllib/_parser.py:20:26: Could not resolve import of 'ParseFloat' from '._types'",
"Line 69 is too long (89 characters)",
"Use double quotes for strings",
"Use double quotes for strings",
Expand Down
Loading