Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[red-knot] feat: Introduce Truthy and Falsy to allow more precise typing #13665

Closed
171 changes: 165 additions & 6 deletions crates/red_knot_python_semantic/src/types.rs
Original file line number Diff line number Diff line change
Expand Up @@ -245,6 +245,13 @@ pub enum Type<'db> {
Unbound,
/// The None object -- TODO remove this in favor of Instance(types.NoneType)
None,
/// Type for narrowing a type to the subset of that type that is truthy (bool(x) == True).
/// Mainly used in intersections: `X & Truthy`. Example: for `list` it's all lists of `len > 0`
Comment on lines +248 to +249
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like for these doc comments to mostly stick to a factual description of what the type represents. Details of how it is used and examples of its use (especially in this case, as we figure out this feature over time) are likely to change, and too much detail here is just likely to go out of date.

Suggested change
/// Type for narrowing a type to the subset of that type that is truthy (bool(x) == True).
/// Mainly used in intersections: `X & Truthy`. Example: for `list` it's all lists of `len > 0`
/// The type representing all objects whose `__bool__` returns `True`.

Truthy,
/// Type for narrowing a type to the subset of that type that is falsy (bool(x) == False).
/// Mainly used in intersections: `X & Falsy`. Example: for `int` it's the singleton
/// `IntLiteral[0]`.
Comment on lines +251 to +253
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Type for narrowing a type to the subset of that type that is falsy (bool(x) == False).
/// Mainly used in intersections: `X & Falsy`. Example: for `int` it's the singleton
/// `IntLiteral[0]`.
/// The type representing all objects whose `__bool__` returns `False`.

Falsy,
/// Temporary type for symbols that can't be inferred yet because of missing implementations.
/// Behaves equivalently to `Any`.
///
Expand Down Expand Up @@ -393,6 +400,72 @@ impl<'db> Type<'db> {
}
}

/// Return the `Truthy` subset of this type (intersection with Truthy).
#[must_use]
pub fn truthy(&self, db: &'db dyn Db) -> Type<'db> {
IntersectionBuilder::build_truthy(db, *self)
}

/// Return the `Falsy` subset of this type (intersection with Falsy).
#[must_use]
pub fn falsy(&self, db: &'db dyn Db) -> Type<'db> {
self.falsy_set(db)
.unwrap_or_else(|| IntersectionBuilder::build_falsy(db, *self))
Comment on lines +412 to +413
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we do this short-circuit through self.falsy_set here but not in truthy above?

As commented below, I think that the truthy_set and falsy_set methods should be internal to IntersectionBuilder, not methods on Type, and I don't think we should break the abstraction here: we should just build the right intersection, and IntersectionBuilder should be responsible to simplify that intersection as appropriate.

To be honest, I probably wouldn't include these helper methods at all, unless it's really clear that these will be very common operations and it's important that we make them as ergonomic as possible. I'd rather just go with a general IntersectionType::from_elements helper and use that where we need to build any intersection, including intersections with Truthy and Falsy.

}

/// Returns, if it can be expressed, the set of values that are falsy in this type. This is
/// defined for some builtin types (e.g. int, str, ...)
#[must_use]
pub fn falsy_set(&self, db: &'db dyn Db) -> Option<Type<'db>> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is really implementing "simplification of intersection with Falsy", which is logic I would expect to find in IntersectionBuilder rather than in a method on Type.

match self {
Self::Instance(class) => match class.known(db) {
// For the following types, we know the logic for falsiness and can represent
// specific instances of that type, so we can give back a precise subset.
Some(KnownClass::Bool) => Some(Self::BooleanLiteral(false)),
Some(KnownClass::Int) => Some(Self::IntLiteral(0)),
Some(KnownClass::Bytes) => Some(Self::BytesLiteral(BytesLiteralType::empty(db))),
Some(KnownClass::Str) => Some(Self::StringLiteral(StringLiteralType::empty(db))),
Some(KnownClass::Tuple) => Some(Self::Tuple(TupleType::empty(db))),
Comment on lines +425 to +428
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, because all four of these types (int, bytes, str, tuple) are subclassable, and the subclass is free to override __bool__ to have whatever behavior the subclass wants, we can't safely make this simplification. E.g. str & Falsy could include an instance of a custom subclass of str with custom behavior that makes it falsy. So I think we need to remove these four cases.

_ => match self.bool(db) {
// When a `__bool__` signature is `AlwaysFalse`, it means that the falsy set is
// the whole type (that we had as input), and the truthy set is empty.
Truthiness::AlwaysFalse => Some(*self),
Truthiness::AlwaysTrue => Some(Type::Never),
Truthiness::Ambiguous => None,
},
},
Type::LiteralString => Some(Self::StringLiteral(StringLiteralType::empty(db))),
_ => match self.bool(db) {
Truthiness::AlwaysFalse => Some(*self),
Truthiness::AlwaysTrue => Some(Type::Never),
Truthiness::Ambiguous => None,
},
Comment on lines +429 to +442
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this duplication can be avoided by having a two-layer structure, where an inner method (or an initial match statement within one method) handles only the specific known cases, and if it returns None, an outer method (or second check) tries the "convert to bool" option.

}
}

/// Returns, if it can be expressed, the set of values that are truthy in this type. This is
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "can be expressed" formulation, here and above, has an implicit "without an explicit intersection type" caveat: all intersections with truthy/falsy "can be expressed" as an intersection. This is another reason why I think these methods should rather be internal implementation details of IntersectionBuilder.

/// rarely defined, with the exception of the `builtins.bool` type.
#[must_use]
pub fn truthy_set(&self, db: &'db dyn Db) -> Option<Type<'db>> {
match self {
Type::Instance(class) => match class.known(db) {
Some(KnownClass::Bool) => Some(Type::BooleanLiteral(true)),
_ => match self.bool(db) {
// When a `__bool__` signature is `AlwaysTrue`, it means that the truthy set is
// the whole type (that we had as input), and the falsy set is empty.
Truthiness::AlwaysTrue => Some(*self),
Truthiness::AlwaysFalse => Some(Type::Never),
Truthiness::Ambiguous => None,
},
},
_ => match self.bool(db) {
Truthiness::AlwaysFalse => Some(*self),
Truthiness::AlwaysTrue => Some(Type::Never),
Truthiness::Ambiguous => None,
},
}
}

/// Return true if the type is a class or a union of classes.
pub fn is_class(&self, db: &'db dyn Db) -> bool {
match self {
Expand Down Expand Up @@ -486,6 +559,8 @@ impl<'db> Type<'db> {
}
Type::Unknown => Type::Unknown,
Type::Unbound => Type::Unbound,
Type::Truthy => Type::Unknown,
Type::Falsy => Type::Unknown,
Comment on lines +562 to +563
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be Type::Todo for now. The correct implementation here will be to return a bound method returning Literal[True] or Literal[False] if someone accesses __bool__, and otherwise fall back to member access on object. We shouldn't just fall back to Unknown silently; these types are not gradual forms.

Type::None => {
// TODO: attribute lookup on None type
Type::Todo
Expand Down Expand Up @@ -542,14 +617,17 @@ impl<'db> Type<'db> {
Type::Any | Type::Todo | Type::Never | Type::Unknown | Type::Unbound => {
Truthiness::Ambiguous
}
Type::None => Truthiness::AlwaysFalse,
Type::Function(_) => Truthiness::AlwaysTrue,
Type::Module(_) => Truthiness::AlwaysTrue,
Type::None | Type::Falsy => Truthiness::AlwaysFalse,
Type::Function(_) | Type::Module(_) | Type::Truthy => Truthiness::AlwaysTrue,
Type::Class(_) => {
// TODO: lookup `__bool__` and `__len__` methods on the class's metaclass
// More info in https://docs.python.org/3/library/stdtypes.html#truth-value-testing
Truthiness::Ambiguous
}
// Temporary special case for `FunctionType` until we handle generic instances
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the word "generic" like this is ambiguous, sounds like it could mean generic types.

Suggested change
// Temporary special case for `FunctionType` until we handle generic instances
// Temporary special case for `FunctionType` until we handle general `Type::Instance`

Type::Instance(class) if class.is_known(db, KnownClass::FunctionType) => {
Truthiness::AlwaysTrue
}
Comment on lines +627 to +630
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this necessary to unblock something else in this PR? If so, that's fine. If not, why should we add a temporary special case for this? Maybe we just leave it out and wait for the general Type::Instance handling?

Type::Instance(_) => {
// TODO: lookup `__bool__` and `__len__` methods on the instance's class
// More info in https://docs.python.org/3/library/stdtypes.html#truth-value-testing
Expand All @@ -570,9 +648,23 @@ impl<'db> Type<'db> {
}
first_element_truthiness
}
Type::Intersection(_) => {
// TODO
Truthiness::Ambiguous
Type::Intersection(intersection) => {
// The truthiness of the intersection is the intersection of the truthiness of its
// elements:
// - `Ambiguous` & `Truthy` == `Truthy`
// - `Ambiguous` & `Falsy` == `Falsy`
// - `Truthy` & `Falsy` == `Never` -- this should be impossible to build
intersection
// Negative elements (what this intersection should not be) do not have an
// influence on the truthiness of the intersection.
// Or if they should, this should be simplified at build time.
.positive(db)
.iter()
.map(|ty| ty.bool(db))
// Stop at the first `Truthy`or `Falsy` since an intersection containing both
// should simplify to the empty intersection at build time.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// should simplify to the empty intersection at build time.
// should simplify to Never at build time.

.find(|truthiness| !truthiness.is_ambiguous())
Comment on lines +664 to +666
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is more narrow than what the code actually does; Truthy and Falsy aren't the only types that have a non-ambiguous truthiness.

What this means is that this logic is only correct if we do eagerly check truthiness of every type added to an intersection and resolve to Never if we find both an always-truthy type and an always-falsy type in the intersection. I think it's correct to do this, I'm a bit worried about the performance impact. But I think we should probably try doing it and see if it actually becomes a performance bottleneck.

.unwrap_or(Truthiness::Ambiguous)
}
Type::IntLiteral(num) => Truthiness::from(*num != 0),
Type::BooleanLiteral(bool) => Truthiness::from(*bool),
Expand Down Expand Up @@ -718,10 +810,12 @@ impl<'db> Type<'db> {
Type::Todo => Type::Todo,
Type::Unknown => Type::Unknown,
Type::Unbound => Type::Unknown,
Type::Truthy | Type::Falsy => Type::Unknown,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also result in a diagnostic; we could group it under the below TODO comment for that, for now

Type::Never => Type::Never,
Type::Class(class) => Type::Instance(*class),
Type::Union(union) => union.map(db, |element| element.to_instance(db)),
// TODO: we can probably do better here: --Alex
// TODO: case of `type[X] & Truthy` or `type[X] & Falsy` should be straightforward
Comment on lines 817 to +818
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the added comment here is necessary, nor was the pre-existing comment, now that we have Type::Todo:

Suggested change
// TODO: we can probably do better here: --Alex
// TODO: case of `type[X] & Truthy` or `type[X] & Falsy` should be straightforward

Type::Intersection(_) => Type::Todo,
// TODO: calling `.to_instance()` on any of these should result in a diagnostic,
// since they already indicate that the object is an instance of some kind:
Expand All @@ -745,6 +839,7 @@ impl<'db> Type<'db> {
match self {
Type::Unbound => Type::Unbound,
Type::Never => Type::Never,
Type::Truthy | Type::Falsy => Type::Unknown,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be type[object] - for now we can just make it a Todo with a comment that we don't have generic type yet (like some other todo comments in this method)

Type::Instance(class) => Type::Class(*class),
Type::Union(union) => union.map(db, |ty| ty.to_meta_type(db)),
Type::BooleanLiteral(_) => KnownClass::Bool.to_class(db),
Expand Down Expand Up @@ -1452,24 +1547,88 @@ pub struct IntersectionType<'db> {
negative: FxOrderSet<Type<'db>>,
}

impl<'db> IntersectionType<'db> {
/// If this is an intersection of `X & Truthy` (generalized to `(X & Y) & Truthy`), returns the
/// left side of the intersection (the type that intersects with `Truthy`).
fn truthy_of(self, db: &'db dyn Db) -> Option<Type<'db>> {
if self.positive(db).contains(&Type::Truthy) {
let builder = self
.positive(db)
.iter()
.filter(|ty| *ty != &Type::Truthy)
.fold(IntersectionBuilder::new(db), |builder, ty| {
builder.add_positive(*ty)
});
Some(
self.negative(db)
.iter()
.fold(builder, |builder, ty| builder.add_negative(*ty))
.build(),
)
} else {
None
}
}

/// If this is an intersection of `X & Falsy` (generalized to `(X & Y) & Falsy`), returns the
/// left side of the intersection (the type that intersects with `Falsy`).
fn falsy_of(self, db: &'db dyn Db) -> Option<Type<'db>> {
if self.positive(db).contains(&Type::Falsy) {
let builder = self
.positive(db)
.iter()
.filter(|ty| *ty != &Type::Falsy)
.fold(IntersectionBuilder::new(db), |builder, ty| {
builder.add_positive(*ty)
});
Some(
self.negative(db)
.iter()
.fold(builder, |builder, ty| builder.add_negative(*ty))
.build(),
)
} else {
None
}
}
}

Comment on lines +1550 to +1595
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we generalize our approach in the intersection builder, we shouldn't need these methods; I think the approach using these methods is more special-cased than what we ideally want.

#[salsa::interned]
pub struct StringLiteralType<'db> {
#[return_ref]
value: Box<str>,
}

impl<'db> StringLiteralType<'db> {
pub fn empty(db: &'db dyn Db) -> Self {
Self::new(db, Box::default())
}
}

#[salsa::interned]
pub struct BytesLiteralType<'db> {
#[return_ref]
value: Box<[u8]>,
}

impl<'db> BytesLiteralType<'db> {
pub fn empty(db: &'db dyn Db) -> Self {
Self::new(db, Box::default())
}
}

#[salsa::interned]
pub struct TupleType<'db> {
#[return_ref]
elements: Box<[Type<'db>]>,
}

impl<'db> TupleType<'db> {
pub fn empty(db: &'db dyn Db) -> Self {
Self::new(db, Box::default())
}
}

#[cfg(test)]
mod tests {
use super::{
Expand Down
Loading
Loading