-
Notifications
You must be signed in to change notification settings - Fork 165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix override errors #819
Fix override errors #819
Conversation
differences: dict = super().diff(other_profile, options) | ||
other_profile = cast(CategoricalColumn, other_profile) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you attempted to run this without adding the cast()
s --> just changing the types to the base type / super type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, some of the diff()
methods work fine without the cast()
while some require them. I casted in all of them because it seems intuitive; at that point, the code has checked that other_profile
is an instance of the current class (by calling the superclass's diff()
, which explicitly does the type check), so we can treat it as such. Would you prefer removing the cast()
s that aren't strictly needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm thinking about LSP and the cast()
part feels like a violation of LSP. If we assume LSP, my intuition is that mypy
should pass without casting because the super type / base class type should be able to pass all the necessary things for mypy
to say "yeah, all is well" without the cast()
. My inference of this code is that the cast()
is not solving the LSP issue fundamentally.
So, I would say avoid cast()
where possible -- especially in the case of this issue which appears to be LSP fundamentally. CC @JGSweets
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely agree that this is a workaround and violates LSP in principle, but I think the reason is that the code throws an error for certain inputs rather than because of cast()
. I guess I'm thinking more about the Python runtime than mypy specifically. If you still want me to remove the excess cast()
s just let me know and I'll add a commit to this branch.
An alternative solution: To strictly adhere to LSP, we could remove (or rename) the diff()
method from the superclasses NumericStatsMixin
and BaseColumnProfiler
, then implement diff()
for each subclass without overriding anything. This way, the subclasses can have the correct argument types without violating LSP. Would that possibly be a better solution? I can make another PR for that if you want to compare it to this one. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I definitely am of the opinion we should remove cast()
where at all possible.
I'm super game if you want to pop up a PR (outside this one) around the alternative solution
you proposed above. To keep things a bit more clean on the PRs, I'd recommend:
- remove the casts where possible in this PR
- comment on the issue related to this PR on the remaining
cast()
that violateLSP
- then in a follow-up PR propose the alternative solution to fix the fundamental LSP issue
trying to avoid PRs getting too unwieldy. Thx, @junholee6a!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, needs an update branch
when you have a moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll proceed with your recommendations!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good -- thanks, @junholee6a!
Head branch was pushed to by a user without write access
This ensures that a subclass of both NumericStatsMixin and BaseColumnProfiler doesn't have multiple method signatures with the same method names.
Fixed override errors by making the subclass method signatures match the superclass's. Every method modified calls a superclass method which checks the type of the method parameters, so there is no need to make the method parameter types stricter here.
1b4c9f4
to
8b14d72
Compare
if not ( | ||
isinstance(other1, NumericStatsMixin) | ||
and isinstance(other2, NumericStatsMixin) | ||
): | ||
raise ValueError("Parameters must be of type NumericStatsMixin.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this required by mpy
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this if-statement causes the mypy errors in the attached image, because the if-statement implicitly casts other1
and other2
to type NumericStatsMixin
.
Like the other subclasses we discussed above, some of the methods in NumericStatsMixin
override superclass methods despite not all input types being valid, which in principle violates LSP. I was thinking of fixing this in the same way as the other subclasses in the next PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it -- just make sure to bullet point out the fixes in the next PR in the issue too #747
""" | ||
Generate differences between the orders of two DataLabeler columns. | ||
|
||
:return: Dict containing the differences between orders in their | ||
appropriate output formats | ||
:rtype: dict | ||
""" | ||
# Make sure other_profile's type matches this class |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a temp solution, would it work to assert type instead of a cast?
assert type(other_profile) is DataLabelerColumn
that way it clearly describes an issue when typing doesn't clarify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to prolong, just don't want to regress on clarity to the user.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the mypy error will arise when the cast()
is removed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so although the assert
raises an issue at run time, during mypy
checks, the error will arise and pre-commit will fail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to prolong, just don't want to regress on clarity to the user.
not at all -- definitely not wanting to regress, either
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing now that the diff already does this assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That assert
does an implicit cast to DataLablerColumn
so it would work for mypy. But these diff()
methods call super().diff()
right before casting, which already does an equivalent of assert
to check other_profile
's type and raise an appropriate TypeError
. assert
is definitely more readable though so I'll replace the cast()
s with assert
statements if you'd prefer.
The type checking in super().diff()
:
if not isinstance(other_profile, cls): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah the assert type(<class>) is <base_class_name>
seeminly does pass the mypy
checks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Realizing now that the diff already does this assert
My bad, didn't see this while I was typing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
although the super().diff()
does that effectively like ya'll are saying too
Issue: #747
NumericStatsMixin
a subclass ofBaseColumnProfiler
NumericStatsMixin
is supposed to be a subclass ofBaseColumnProfiler
NumericStatsMixin
andBaseColumnProfiler
as superclasses. The subclasses,NumericStatsMixin
, andBaseColumnProfiler
all implement adiff
method, with different method signatures. This causes the subclass to override two parent methods at once, which isn't idealBaseColumnProfiler
andBaseCompiler
setup.cfg
which causes mypy to ignoreoverride
errors