Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix override errors #819

Merged
merged 3 commits into from
May 18, 2023
Merged

Conversation

junholee6a
Copy link
Contributor

@junholee6a junholee6a commented May 14, 2023

Issue: #747

  • Make NumericStatsMixin a subclass of BaseColumnProfiler
    • Comments indicate that NumericStatsMixin is supposed to be a subclass of BaseColumnProfiler
    • Several classes have both NumericStatsMixin and BaseColumnProfiler as superclasses. The subclasses, NumericStatsMixin, and BaseColumnProfiler all implement a diff method, with different method signatures. This causes the subclass to override two parent methods at once, which isn't ideal
  • Make subclass methods compliant with Liskov Substitution Principle (LSP), which says you need to be able to substitute in a subclass wherever a superclass goes. This means that the parameter types for overridden methods cannot be any narrower than what the superclass has. We resolve this by making the method signatures for subclass methods the same as those of the superclass. In case someone passes in a value of the wrong type, the methods already check the types and throw errors accordingly.
  • Relevant superclasses are BaseColumnProfiler and BaseCompiler
  • Remove the line in setup.cfg which causes mypy to ignore override errors

differences: dict = super().diff(other_profile, options)
other_profile = cast(CategoricalColumn, other_profile)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you attempted to run this without adding the cast()s --> just changing the types to the base type / super type?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@junholee6a junholee6a May 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, some of the diff() methods work fine without the cast() while some require them. I casted in all of them because it seems intuitive; at that point, the code has checked that other_profile is an instance of the current class (by calling the superclass's diff(), which explicitly does the type check), so we can treat it as such. Would you prefer removing the cast()s that aren't strictly needed?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking about LSP and the cast() part feels like a violation of LSP. If we assume LSP, my intuition is that mypy should pass without casting because the super type / base class type should be able to pass all the necessary things for mypy to say "yeah, all is well" without the cast(). My inference of this code is that the cast() is not solving the LSP issue fundamentally.

So, I would say avoid cast() where possible -- especially in the case of this issue which appears to be LSP fundamentally. CC @JGSweets

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely agree that this is a workaround and violates LSP in principle, but I think the reason is that the code throws an error for certain inputs rather than because of cast(). I guess I'm thinking more about the Python runtime than mypy specifically. If you still want me to remove the excess cast()s just let me know and I'll add a commit to this branch.

An alternative solution: To strictly adhere to LSP, we could remove (or rename) the diff() method from the superclasses NumericStatsMixin and BaseColumnProfiler, then implement diff() for each subclass without overriding anything. This way, the subclasses can have the correct argument types without violating LSP. Would that possibly be a better solution? I can make another PR for that if you want to compare it to this one. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely am of the opinion we should remove cast() where at all possible.

I'm super game if you want to pop up a PR (outside this one) around the alternative solution you proposed above. To keep things a bit more clean on the PRs, I'd recommend:

  • remove the casts where possible in this PR
  • comment on the issue related to this PR on the remaining cast() that violate LSP
  • then in a follow-up PR propose the alternative solution to fix the fundamental LSP issue

trying to avoid PRs getting too unwieldy. Thx, @junholee6a!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, needs an update branch when you have a moment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I'll proceed with your recommendations!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good -- thanks, @junholee6a!

@taylorfturner taylorfturner added the static_typing mypy static typing issues label May 16, 2023
auto-merge was automatically disabled May 16, 2023 18:56

Head branch was pushed to by a user without write access

This ensures that a subclass of both NumericStatsMixin and
BaseColumnProfiler doesn't have multiple method signatures with the
same method names.
Fixed override errors by making the subclass method signatures match
the superclass's. Every method modified calls a superclass method which
checks the type of the method parameters, so there is no need to make
the method parameter types stricter here.
Comment on lines +214 to +219
if not (
isinstance(other1, NumericStatsMixin)
and isinstance(other2, NumericStatsMixin)
):
raise ValueError("Parameters must be of type NumericStatsMixin.")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this required by mpy?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this if-statement causes the mypy errors in the attached image, because the if-statement implicitly casts other1 and other2 to type NumericStatsMixin.

Like the other subclasses we discussed above, some of the methods in NumericStatsMixin override superclass methods despite not all input types being valid, which in principle violates LSP. I was thinking of fixing this in the same way as the other subclasses in the next PR.

Screenshot 2023-05-17 at 2 17 10 PM

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it -- just make sure to bullet point out the fixes in the next PR in the issue too #747

"""
Generate differences between the orders of two DataLabeler columns.

:return: Dict containing the differences between orders in their
appropriate output formats
:rtype: dict
"""
# Make sure other_profile's type matches this class
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as a temp solution, would it work to assert type instead of a cast?

assert type(other_profile) is DataLabelerColumn

that way it clearly describes an issue when typing doesn't clarify.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to prolong, just don't want to regress on clarity to the user.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the mypy error will arise when the cast() is removed

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so although the assert raises an issue at run time, during mypy checks, the error will arise and pre-commit will fail

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to prolong, just don't want to regress on clarity to the user.

not at all -- definitely not wanting to regress, either

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now that the diff already does this assert

Copy link
Contributor Author

@junholee6a junholee6a May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That assert does an implicit cast to DataLablerColumn so it would work for mypy. But these diff() methods call super().diff() right before casting, which already does an equivalent of assert to check other_profile's type and raise an appropriate TypeError. assert is definitely more readable though so I'll replace the cast()s with assert statements if you'd prefer.

The type checking in super().diff():

if not isinstance(other_profile, cls):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah the assert type(<class>) is <base_class_name> seeminly does pass the mypy checks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Realizing now that the diff already does this assert

My bad, didn't see this while I was typing

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

although the super().diff() does that effectively like ya'll are saying too

@taylorfturner taylorfturner merged commit a94b4e4 into capitalone:main May 18, 2023
@junholee6a junholee6a deleted the fix-override-errors branch May 18, 2023 21:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
static_typing mypy static typing issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants