-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[red-knot] feat: Introduce Truthy
and Falsy
to allow more precise typing
#13665
Conversation
|
Truthy
and Falsy
to allow more precise typingTruthy
and Falsy
to allow more precise typing
Thanks for the PR, this is very cool! I've started a review, but there's a lot to consider here, and I have some other pressing things to take care of, so it may be a couple days before I can complete the review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I'll echo Carl that this is pretty cool!!
Carl's probably better placed to do a full review here, in particular for the logic you added in builder.rs
-- but here's some things I noticed, in the meantime:
Co-authored-by: Alex Waygood <Alex.Waygood@Gmail.com>
- Fix a logic issue when dealing with the falsy/truthy set of instances (set of size 1). - Introduce `TupleType::empty` and others as a convenience method.
requested changes from initial review were addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for tackling this! It's quite an ambitious project, and you definitely took it further than I'd anticipated for a first PR on the subject. Reviewing it was super useful in forcing me to think through a bunch of cases carefully, so regardless of what happens with it now, that was a super useful contribution.
My conclusion (recorded in the inline comments) is that I think we need to significantly scale back our aggressiveness in applying conclusions from truthiness and falsiness, which I think will probably simplify some of the handling. And in general I'd like to see the handling in UnionBuilder and IntersectionBuilder be a little more general and do less avoidable repeated work.
Thanks again, and sorry for the delayed review!
/// Returns, if it can be expressed, the set of values that are falsy in this type. This is | ||
/// defined for some builtin types (e.g. int, str, ...) | ||
#[must_use] | ||
pub fn falsy_set(&self, db: &'db dyn Db) -> Option<Type<'db>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is really implementing "simplification of intersection with Falsy", which is logic I would expect to find in IntersectionBuilder
rather than in a method on Type
.
} | ||
} | ||
|
||
/// Returns, if it can be expressed, the set of values that are truthy in this type. This is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "can be expressed" formulation, here and above, has an implicit "without an explicit intersection type" caveat: all intersections with truthy/falsy "can be expressed" as an intersection. This is another reason why I think these methods should rather be internal implementation details of IntersectionBuilder
.
// `X` -> `AlwaysFalse` => `X & Falsy` = `X` | ||
let empty_tuple = Type::Tuple(TupleType::empty(&db)); | ||
assert_eq!( | ||
IntersectionBuilder::build_falsy(&db, empty_tuple), | ||
empty_tuple | ||
); | ||
|
||
assert_eq!( | ||
IntersectionBuilder::build_falsy(&db, empty_bytes), | ||
empty_bytes | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bug that pre-exists your PR, but unfortunately even Type::Tuple
should always have Truthiness::Ambiguous
, again because of subclassing: even the "empty tuple" type can include an "empty" instance of a hypothetical tuple subclass that might have arbitrary __bool__
behavior; the __bool__
behavior isn't specified as part of the tuple type. So we should fix this in the Type::bool
method so Type::Tuple
is always Truthiness::Ambiguous
, and rewrite this test to use a different type than tuple.
(This doesn't apply to e.g. BytesLiteral or StringLiteral or IntLiteral -- those types can only represent literals, so we know they are the built-in type, not a subclass.)
assert_eq!( | ||
IntersectionBuilder::build_falsy( | ||
&db, | ||
Type::Tuple(TupleType::new(&db, vec![Type::IntLiteral(0)].into())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As mentioned below, we can't actually know that a non-empty Type::Tuple
is truthy, so we should fix that in Type::bool
and use a different truthy type here.
if self.positive.contains(&Type::Truthy) | ||
&& self | ||
.positive | ||
.iter() | ||
.all(|ty| ty.bool(db) == Truthiness::AlwaysTrue) | ||
{ | ||
self.positive.remove(&Type::Truthy); | ||
} | ||
|
||
// If we have `Falsy` and all elements are always false, we can remove it | ||
if self.positive.contains(&Type::Falsy) | ||
&& self | ||
.positive | ||
.iter() | ||
.all(|ty| ty.bool(db) == Truthiness::AlwaysFalse) | ||
{ | ||
self.positive.remove(&Type::Falsy); | ||
} | ||
|
||
// If we have both `AlwaysTrue` and `AlwaysFalse`, this intersection should be empty. | ||
if self | ||
.positive | ||
.iter() | ||
.any(|ty| ty.bool(db) == Truthiness::AlwaysFalse) | ||
&& self | ||
.positive | ||
.iter() | ||
.any(|ty| ty.bool(db) == Truthiness::AlwaysTrue) | ||
{ | ||
self.positive.clear(); | ||
self.negative.clear(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ends up calling .bool(db)
on every type in self.positive
four times; it shouldn't be necessary to do it more than once.
// If an intersection is `X & Falsy`, try to replace it by the falsy set of `X` | ||
// TODO: this doesn't handle the case `X & Y & Falsy` where `(X & Y)` would have a known | ||
// falsy set (this doesn't happen yet, can it happen?) | ||
if self.positive.len() == 2 && self.positive.contains(&Type::Falsy) { | ||
self.positive.remove(&Type::Falsy); | ||
let ty = self.positive.iter().next().unwrap(); | ||
if let Some(falsy) = ty.falsy_set(db) { | ||
self.positive.clear(); | ||
self.positive.insert(falsy); | ||
} else { | ||
self.positive.insert(Type::Falsy); | ||
} | ||
} | ||
|
||
// If an intersection is `X & Truthy`, try to replace it by the truthy set of `X` | ||
// TODO: this doesn't handle the case `X & Y & Truthy` where `(X & Y)` would have a known | ||
// truthy set (this doesn't happen yet, can it happen?) | ||
if self.positive.len() == 2 && self.positive.contains(&Type::Truthy) { | ||
self.positive.remove(&Type::Truthy); | ||
let ty = self.positive.iter().next().unwrap(); | ||
if let Some(truthy) = ty.truthy_set(db) { | ||
self.positive.clear(); | ||
self.positive.insert(truthy); | ||
} else { | ||
self.positive.insert(Type::Truthy); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should handle this more like union does, doing pairwise checks between each element (and probably better to do it as we add elements, rather than in a final simplify
pass). And preferably the check would be more generic, like simplify_intersection_of(db, a: Type, b: Type) -> Option<Type>
(returning None
if no simplification is possible, or a Type if the intersection of a
and b
can simplify to a type) rather than so special-cased to Truthy
and Falsy
. There are a lot more cases where we can simplify intersections than just Truthy and Falsy (e.g. intersections between a lot of not-identical literal types just simplify to Never). Not saying we have to implement all of those in this PR, just that we should try to set up general infrastructure based on general set-theoretic principles, rather than a long list of special cases, which is going to be both harder to maintain and a lot less efficient.
// Handle `X & Truthy | X & Falsy` -> `X` | ||
(Type::Intersection(present), Type::Intersection(inserted)) => { | ||
// Detect `X & Truthy | Y & Falsy` | ||
if let (Some(present_ty), Some(inserted_ty)) = | ||
(present.truthy_of(self.db), inserted.falsy_of(self.db)) | ||
{ | ||
// If `X` = `Y`, we can simplify `X & Truthy | X & Falsy` to `X` | ||
if present_ty == inserted_ty { | ||
to_add = present_ty; | ||
to_remove.push(index); | ||
break; | ||
} | ||
} | ||
|
||
// Detect `X & Falsy | Y & Truthy` | ||
if let (Some(present_ty), Some(inserted_ty)) = | ||
(present.falsy_of(self.db), inserted.truthy_of(self.db)) | ||
{ | ||
// If `X` = `Y`, we can simplify `X & Falsy | X & Truthy` to `X` | ||
if present_ty == inserted_ty { | ||
to_add = present_ty; | ||
to_remove.push(index); | ||
break; | ||
} | ||
} | ||
} | ||
|
||
// Corner-case of the previous `X & Truthy | X & Falsy` -> `X` | ||
// Some `X & Truthy` or `X & Falsy` types have been simplified to a | ||
// specific subset of instances of the type. | ||
(Type::Intersection(inter), ty) | (ty, Type::Intersection(inter)) => { | ||
if let Some(inter_ty) = inter.truthy_of(self.db) { | ||
// 'X & Truthy | y' -> test if `y` = `X & Falsy` | ||
if let Some(falsy_set) = inter_ty.falsy_set(self.db) { | ||
if falsy_set == ty { | ||
to_add = inter_ty; | ||
to_remove.push(index); | ||
break; | ||
} | ||
} | ||
} | ||
|
||
if let Some(inter_ty) = inter.falsy_of(self.db) { | ||
// 'X & Falsy | y' -> test if `y` = `X & Truthy` | ||
if let Some(truthy_set) = inter_ty.truthy_set(self.db) { | ||
if truthy_set == ty { | ||
to_add = inter_ty; | ||
to_remove.push(index); | ||
break; | ||
} | ||
} | ||
} | ||
} | ||
_ => {} | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not going to comment in detail on this since I've suggested semantic changes in other comments that I think will end up changing it a lot anyway. But let's keep in mind the general principles that we need to think pretty hard about doing things as efficiently as we can, and we should prefer abstracting to general relations between types whenever we can. For instance, we already have the subtype checks below and the bool checks above -- I think all of it, and a lot of the new stuff you're adding, maybe could abstract to a general simplify_union_of(db, a: Type, b: Type) -> Option<Type>
which returns None
if there is no simplification and a
and b
should both stay in the union, or a new Type
to be added to the union instead of a
and b
. (Not 100% sure if that will cover all cases without actually working it all out in code myself, but hopefully that gives the flavor of the kind of abstractions we should be looking for to keep this code maintainable.)
As I've written up over at #13694 (comment), I no longer think that Truthy and Falsy types are the right way for us to approach this issue, so I'm going to close this PR. I'm sorry that means we won't land the work you put into this PR, but it doesn't mean this PR wasn't helpful; in fact reviewing it was critical in clarifying my thinking on this topic. Thank you so much for the PR! |
Summary
This is a PR trying to address #13632.
Reminder of the problem: when evaluating boolean operations, we could improve precision of types
The idea is that in some contexts, we can narrow a type to the subset of instances that are
truthy
(boo(instance) == True
) or the subset of instances that arefalsy
(boo(instance) == False
).As @carljm suggested, a very generic way to express this would be to implement two new types,
Truthy
andFalsy
, and express "the subset ofA
instances that evaluate toTrue
" asA & Truthy
(and conversely forFalse
andFalsy
).Interface
On a high level, this should be fairly straightforward to use when required
Truthy
andFalsy
A & Truthy
(and falsy):IntersectionBuilder::build_truthy(a_ty)
Interesting Cases
Here's a list of interesting cases that might be unlocked by this more precise type knowledge. Some of them are more or less complicated to handle and maybe shouldn't be supported.
Behaviour of Falsy and Truthy outside of intersections
While they only make sense in intersections and probably should only exist in that context, we do need to implement the behaviour of
Falsy
andTruthy
outside of intersections, as they're aType
variant.Suggestions are welcome if I didn't handle that part properly, I don't have specific opinions or feedback on this part of the implementation.
Consistency of intersections
As it's been raised in this comment by @AlexWaygood, the existence of
Truthy
andFalsy
coupled with having bothpositive
(must be) and negative (must not be) elements in intersection leads to a question:truthy
andfalsy
X & Falsy
=X & ~Truthy
Currently, as this PR explores a first attempt with
Truthy
andFalsy
, we need to make sure that only one of the above two representations is used.Defined falsy and truthy subsets
Some types have a known subset of instances that are
falsy
:bool → False
int → 0
str → ""
bytes → b''
tuple → (,)
list → []
,dict → {}
,set → set()
- we can't really represent that at the momentConversely for truthy (most types in Python are truthy, so this is rarely defined)
bool -> True
This could be used in a few ways (that I can think of)
int & Falsy
toLiteral[0]
(we most probably want that)int & Truthy | Literal[0]
toint
X & Truthy | X & Falsy
→X
(which we should support)int & ~0
(where~0
isLiteral[0]
in the negatives) toint & Truthy
x1 | x2
wherex1
isX & Truthy
andx2
isX & Falsy
BooleanLiteral(true)
,BooleanLiteral(false)
andbool
truthy
andfalsy
valuesSo far, all subsets I can think of have been singletons (no
UnionType
). This means that my code might maybe either be too simple (can't handle non-singleton) or too complicated (tries to handle non singleton) at times.I guess that if we want to include this knowledge in the feature, knowing if non-singleton cases should be supported (or if they even exist) would help.
Simplifying
X & Falsy
/X & Truthy
I think that's a fairly straightforward feature. Some types (e.g.
FunctionType
) have a constantTruthiness
(forFunctionType.bool() → Truthiness::AlwaysTrue
), so associating them withTruthy
orFalsy
can be simplified.X & Truthy
→X
ifx.bool() == Truthiness::AlwaysTrue
X & Falsy
→Never
ifx.bool() == Truthiness::AlwaysTrue
AlwaysFalse
)Overall, we should probably guarantee at build time that you can't have both
AlwaysTrue
andAlwaysFalse
elements in an intersection, because elements of this intersection would need to have both properties, and that's not possible.Test Plan
builder.rs
for intersection/union casesinfer.rs
through boolean chained operations (currently the only way to createX & Truthy
orX & Falsy
intersections)// TODO
Literal[0] | None | Literal[1] & None
becameLiteral[0] | None
with this PR which was the expected behaviourWe could eventually add some additional integration tests that would be "easier to read" by adding support for the following snippet
Concerns & Feedback on the current implementation
I think this feature is very exciting. It would allow us to represent types with more precision, which is always nice.
I do have some concerns with my implementation as it is now
InnerIntersectionBuilder::simplify
methodAs always feedback is welcome, don't hesitate to question the entire direction I took to achieve this, I'll be happy to refine this with whatever we can learn from this first iteration.