Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add type equality is_equal_to member #2368

Merged
merged 12 commits into from
Apr 7, 2023
Merged

Conversation

agoose77
Copy link
Collaborator

@agoose77 agoose77 commented Apr 6, 2023

In general, we have more ways of comparing types than the argument-less __eq__ protocol permits. This PR adds support for comparison of just the nominal parameters, or all parameters.

If we're happy with this idea, I'd want to gradually rework the Content.is_equal_to and add a form equivalent too, probably.

In future, we can extend this signature to e.g. allow same-kind numeric comparisons.

@agoose77 agoose77 temporarily deployed to docs-preview April 6, 2023 15:16 — with GitHub Actions Inactive
@codecov
Copy link

codecov bot commented Apr 7, 2023

Codecov Report

Merging #2368 (1106ca2) into main (ec98c63) will increase coverage by 0.04%.
The diff coverage is 94.33%.

Additional details and impacted files
Impacted Files Coverage Δ
src/awkward/types/type.py 95.60% <83.33%> (-0.38%) ⬇️
src/awkward/types/recordtype.py 86.20% <86.66%> (+1.85%) ⬆️
src/awkward/types/arraytype.py 93.54% <100.00%> (ø)
src/awkward/types/listtype.py 91.66% <100.00%> (-0.23%) ⬇️
src/awkward/types/numpytype.py 92.94% <100.00%> (+1.08%) ⬆️
src/awkward/types/optiontype.py 80.76% <100.00%> (+1.52%) ⬆️
src/awkward/types/regulartype.py 93.18% <100.00%> (+2.07%) ⬆️
src/awkward/types/scalartype.py 80.95% <100.00%> (+3.67%) ⬆️
src/awkward/types/uniontype.py 86.79% <100.00%> (+1.88%) ⬆️
src/awkward/types/unknowntype.py 93.10% <100.00%> (+3.10%) ⬆️

... and 1 file with indirect coverage changes

@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 15:38 — with GitHub Actions Inactive
Comment on lines +104 to 110
and any(
all(
this._is_equal_to(that, all_parameters)
for this, that in zip(self._contents, contents)
)
for contents in permutations(other._contents)
)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like the correct thing to do, but it's really not ideal to be invoking the permutations in order to do so. AFAICR, we have not suggested that the order of a union's contents matters, so we must check equality like this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although records and unions have a canonical contents order that they preserve, it would be correct to say that two records or two unions are equal if they have the same set of contents, regardless of order (zip(fields, contents) in the case of records, which puts an ordering back in for tuples).

I was trying some examples with our new canonicalization rules to see if we'd have to distinguish between types like

A = union[option[X], Y]

and

B = union[X, option[Y]]

I found that an attempt to create

C = option[union[X, Y]]

forces you to go through simplified and makes

D = union[option[X], option[Y]]

but it's still possible to create an asymmetric case (A or B). How about if we squash that avenue as well? That is, an attempt to construct

UnionArray(tags, index, [option_type_array, non_option_type_array])

would (with a future rule) fail and force you to use UnionArray.simplified (already required wherever the codebase makes UnionArray, and then UnionArray.simplified would wrap all the non-option type arrays in an UnmaskedArray if any one of the provided contents is option-type.

How about it? Then you'd have fewer cases to consider when checking equality of types: you wouldn't have to worry about whether A and B should be considered equivalent because they can't exist (they always be D).

This comment can become a new issue, if you like that idea.

Copy link
Collaborator Author

@agoose77 agoose77 Apr 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, this is an interesting thought. I had not thought about the fact that ?union[X, Y], union[?X, Y] and union[X, ?Y] are degenerate. We don't permit ?union, so perhaps we do want this rule, too. Especially as UnmaskedArray is very low overhead.

For posterity, @jpivarski's point is partially distinct from the need to perform permutations; rather, even with permutations there are currently-valid representations that would not evaluate as equivalent despite the functional equivalence.

I'm in favour, and I'll create an issue from this comment.

@agoose77 agoose77 requested a review from jpivarski April 7, 2023 15:51
@agoose77 agoose77 marked this pull request as ready for review April 7, 2023 15:51
@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 16:03 — with GitHub Actions Inactive
Copy link
Member

@jpivarski jpivarski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So types wouldn't even have an __eq__? Users are forced to use is_equal_to?

What about providing is_equal_to for more control, as you've implemented it here, and have __eq__ be an alias to it with sensible defaults? (is_equal_to already has all of its options as opt-in.)

Also, I have a comment below about ambiguity of union[option[X], Y] and union[X, option[Y]] which could be followed up in a new issue/PR.

@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

So types wouldn't even have an eq? Users are forced to use is_equal_to?

No, I agree that we need __eq__; it's an expected Python feature, and should we not implement it, Python will just check object identity (which is definitely not what we want!).

This PR sets __eq__ to be exactly is_equal_to, therefore it invokes the default-argument variant of is_equal_to.

@jpivarski
Copy link
Member

I see:

    __eq__ = is_equal_to

I missed that, searching for "def __eq__".

This is good to go!

@agoose77
Copy link
Collaborator Author

agoose77 commented Apr 7, 2023

@jpivarski when I made #1773, I was leaning towards records having a definable order (as they do for tuples). However, it sounds like we're not interested in pushing that idea, so #1773 really just guarantees order under serialisation only.

Iff. this is the case, do you agree that I need to mirror the permutation logic (in union type equality comparison) for the record comparison case? Nicely, the initial permutation is always the trivial one (c1, c2, ..., cn), which should often be the right one.

@jpivarski
Copy link
Member

Yes, the permutation will always be needed. (That's what I meant in my first paragraph, before going on a tangent about union[?X, Y] and union[X, ?Y].)

The order is preserved through serialization and lots of operations—it's important for us to not lose it—but it is not essential to the meaning of the record or union. Not losing the order is so that a user-supplied Point[x: float64, y: float64] doesn't become Point[y: float64, x: float64], which would be confusing and could be a big problem if there are a lot of fields. But if a user asks whether those two are the same thing, I think we should say "yes."

The way I think about it formally, a record type (instance) is a mapping from names to field types (field instances), and a domain of "names" is not usually ordered. We implement an ordering to not wantonly change it as a kindness to users, but formally, it isn't part of the definition.

@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 17:11 — with GitHub Actions Inactive
@agoose77 agoose77 enabled auto-merge (squash) April 7, 2023 19:11
@agoose77 agoose77 temporarily deployed to docs-preview April 7, 2023 19:21 — with GitHub Actions Inactive
@pre-commit-ci pre-commit-ci bot temporarily deployed to docs-preview April 7, 2023 19:46 Inactive
@agoose77 agoose77 merged commit e91ba74 into main Apr 7, 2023
@agoose77 agoose77 deleted the agoose77/feat-types-equal branch April 7, 2023 19:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants