-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
It doesn't make sense that cmp and hash are set independently, does it? #136
Comments
There’s exactly one reason: unhashable attributes. Sometimes you want an attribute contain a dictionary and still be comparable. |
Oh and sometimes you want to implement |
Sure, the "contain a dictionary and still be comparable" case is Currently there are 2^3 = 8 possible settings for cmp/hash/frozen when calling
And when calling My main concern is that plain I think in a perfect world the API I'd prefer is: @attr.s(cmp=True, frozen=True)
class Foo:
__hash__ = None ) Of course backwards-compatibility is a problem... |
Ugh, this is a complicated issue. I think #3 is relevant here. Some of the ideas from #3 were having I agree we're maybe too trigger happy with defining hashes for classes. Making your class hashable is a complex thing. Basically the only reason to make your class hashable is to use it as a key in a dict, or to put it in a set, which introduces additional constraints, which makes this even more complex.
This is probably a bug. (Interestingly, it will work properly and raise an exception if I think an easy case could be made for the following change: if the class is |
There’s a lot of issues raised here and honestly I can’t follow all of them. Just as a preamble/excuse: the reason you can set all these things is that attrs started out with a different scope. More of a class construction kit so the users was supposed to know what they did and invariants weren’t caught. The reality changed though (which makes me happy of course) and this dark little corner is just something nobody really cared about (which is also why I’m OK with being a bit more aggressive about fixing it). What I understand:
What I don’t understandWhat’s wrong with This is in line with what the Python docs say, and if people want to have it hashed, they have to put extra effort into it. But it’s still comparable. What Should Be Done
I think 1+2 is something we should tackle before the new release. In any case thank you for being patient with me. It took me two coffees, to write this down and honestly I don’t think anyone really thought this completely though. I’d also welcome input from @glyph as usual :) |
AFAICT the cases that make sense are:
My proposal above was that we map these 4 cases to the 4 argument patterns, respectively:
So I don't see any use for a The one additional wrinkle is that in principle someone might want something really odd, like, "this is an immutable class with an attrs-defined @attr.s(frozen=True, cmp=True)
class Foo:
__hash__ = None ) Does this cover all of your 5 cases? I'm not quite clear on what they are, since the 5th one you added is already on my list of 4 :-). And yeah of these four cases probably the best default is |
My fifth is My understanding is the following:
class Base:
def __hash__(self):
return something_something_very_smart_that_we_cannot_anticipate
@attr.s(hash=False)
class C(Base):
... Main cases:
Valid but kind of pointless:
After we’ve added Am I missing something? One final thought: in attrs, True/False for hash etc always meant “attrs won’t attach one” not “class won’t have one”. Which seems fine except for |
I use
Why does it have to stay? The only case you've come up with where it's meaningful is |
The subclassing use-case is still existent. I want to discourage it’s use but I’m not comfortable to take away the knob altogether. The only thing you’re arguing against is Sorry if I’m too dumb to follow. :| Anyhow: can we agree, that adding There’s two additional feature requests that should be handled in separate issues:
Or is there something I’m missing? |
subclassing is
Yes, this would make it so that attrs never generates classes that break Python's rules without being explicitly asked to, because it's always legal to have Concretely, for the project I mentioned that has 26 calls to Right now, I have:
With the
With my proposal, it becomes:
I.e. on this codebase then the |
Your proposal is pretty much my point 1 from above. To my maintainer eyes things look like this:
Now, I really want to tackle those things separately because they’re separate to me. One bug and one feature. I’ll open a PR later for 1 unless you convince me it’s wrong but let’s talk about 2 to give your mind some peace. :) I’m thinking to talk about “archetypes” (code name?)? How does this look to you: from attr.archetypes import hashable
@hashable
class Item:
"""Implies frozen=True, cmp=True, hash=True.""" The intent would be apparent from the name, not by squinting at the combination of parameters and figuring out side-effects that would vary by attrs release. I’m open to other ways to achieve that. Adding more arguments to |
Hynek, my 3 eurocents. Please don't introduce a new concept with archetypes, keep it simple. Nathaniel is right that your great anti-boilerplate library should stay an anti-boilerplate library. cmp=I also agree with Nathaniel's idea to effectively deprecate use of
The new default would be hash=As for
I've been bitten by hashes defined for mutable classes many times (Thrift classes for Python have this problem and badly defined attrs-based classes, too). |
Another thing that might be worth pointing out explicitly: in Python 3, it's actually a feature of the language that defining an class Foo:
def __eq__(self, other):
pass
assert Foo.__hash__ is None You might expect @attr.s(cmp=True, hash=False)
class Foo:
pass
assert Foo.__hash__ is object.__hash__ |
This issue is sprawling and huge and I'm not sure I have fully worked through all the implications, but at this point I have seen the PR and I think I'm mostly comfortable with the consequences... |
Im at the point of begrudgingly admitting that ?ukasz might be right. Let True/False be and None does the right thing. |
Our previous default behavior was incorrect. Adding __hash__ has to be deliberate action and only makes sense for immutable classes. By default, hash now mirrors eq which the correct behavior. Fixes #136.
Our previous default behavior was incorrect. Adding __hash__ has to be deliberate action and only makes sense for immutable classes. By default, `hash` now mirrors `cmp` which the correct behavior per spec. Fixes #136.
Fixed in #142 |
This issue is a bit too long for me to follow too, but was there consideration for deprecating the current behavior and emitting a warning rather than just changing it? There were valid uses here, despite the ones that were broken. |
@Julian: I won't debate the release procedures because I didn't follow the details, but I'm curious what you mean by "valid use cases". Obviously it's unfortunate that builds got broken no matter whose "fault" it is, but do you mean you have examples of cases where hashing mutable classes is the Right Thing To Do? |
@njsmith no -- they're immutable classes (for whatever that means in Python), so their hashing behavior was correct and working semantically, but now that the default has changed, all of them need to specify Example working code:
and:
And yeah I care less about fault certainly :) than to just report "hey this change broke lots of our builds" in case it was unintentional or unconsidered and might be helpful. I know most of y'all, and that you've got pretty good intentions generally :) I'm imagining that the thinking was something like that immutable is defined by using |
I can only say that I thought long and hard about how to do this without breakage and found nothing. Doing a pro-forma deprecation (w/o warnings) might have saved a few but realistically speaking most breakage would happen anyway except that we'd have a buggy behavior for another year. I'm honestly sorry but nobody came up with anything better than ripping off the bandaid. |
IMHO the thing to do "next time" (I mean, hopefully, for attrs, we never need to do this again) is to figure out a way to do the testing / fixing in downstream libs in advance, not to try to do something like this with a deprecation. The problem with a deprecation is that you don't want to have a long period where you have unnecessarily-wordy-and-explicit options be necessary folk wisdom. |
@hynek totally understood (and appreciated as usual!). Gonna follow up one more time just to make sure I understand here -- The way I would have expected this to go, given that there are both valid and invalid reasons for the default (which I want to double check is agreed upon by everyone?) would have been:
(I've probably missed a case or two, and probably I still am missing some important stuff given that I still haven't fully gone through this ticket, so apologies if I have) @glyph I didn't fully follow your comment -- I know #191 affects twisted, but I and whoever else uses attrs have got lots of internal code which obviously wouldn't have been tested and fixed before a backwards incompatible release so those users need some form of notification right? Anyways, yeah, again, deeeeefinitely not slinging any shade here, just trying to make sure I follow the thought process here. Thanks as usual all, in case it hasn't been said enough, all of you are great :). |
Refs: python-attrs/attrs#136 * origin/attrs-17: hash=True everywhere
They already should have been, as they're internally used as dictionary keys. Changed in attrs v17.1.0 (see python-attrs/attrs#136)
Right now,
cmp=
andhash=
are treated as independent options, both inattr.s()
andattr.ib()
. This does't make much sense to me.If you do
attr.s(cmp=True, hash=False)
then you get a broken class that violates Python's invariant that objects which compare equal must hash equal (if they are hashable at all).If you do
attr.s(cmp=False, hash=True)
then you get a class with a really weirdhash
that has lots of collisions (Foo(x=1)
andFoo(x=1)
are different objects so they compare non-equal, but their hashes are the same.) Also, it is broken if the object is mutable, because the hash can change over time. ...and now that I think about it, objects that havecmp=True
also have the same problem with mutability. Actually I guess all of myattrs
classes are currently illegal, because the defaults are to generate illegal classes. I guess I should go fix that.Anyway, I think the only plausible configurations are:
cmp=True
,frozen=True
,hash=True
cmp=True
,frozen=True
, hash undefined (attrs doesn't have a nice way to do this right now I think?)cmp=False
,hash=False
(useobject.__hash__
)cmp=False
, hash undefinedAnd then for
attr.ib
, why arecmp
andhash
separate arguments? Wouldn't it make more sense to drop thehash
argument, and just use the logic that ifattr.ib(cmp=True)
and we're generating a hash method at all, then we should include this attrib?The text was updated successfully, but these errors were encountered: