Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom __getstate__, __setstate__ for slotted classes #513

Conversation

andhus
Copy link
Contributor

@andhus andhus commented Mar 4, 2019

As per #512 you currently can not implement your custom __getstate__ __setstate__ methods on slotted classes. With this change, attrs does not automatically create and override these methods if they are already implemented - thus making the behaviour consistent with the slotted classes documentation (as far as I understand!?):

You can support protocol 0 and 1 by implementing __getstate__ and __setstate__ methods yourself. Those methods are created for frozen slotted classes because they won’t pickle otherwise.

Edit: I read the docs more carefully and now understand that "You can support protocol 0 and 1 by implementing __getstate__ and __setstate__ methods yourself" refers to __slots__ classes in general but not to attr.s(slots=True) classes. I still don't see any downside from making it possible to define these methods in attr.s(slots=True) classes.

Pull Request Check List

  • Added tests for changed code.
  • New features have been added to our Hypothesis testing strategy.
  • Changes or additions to public APIs are reflected in our type stubs (files ending in .pyi).
    • ...and used in the stub test file tests/typing_example.py.
  • Updated documentation for changed code.
    • New functions/classes have to be added to docs/api.rst by hand.
    • Changes to the signature of @attr.s() have to be added by hand too.
    • Changed/added classes/methods/functions have appropriate versionadded, versionchanged, or deprecated directives.
  • Documentation in .rst files is written using semantic newlines.
  • Changes (and possible deprecations) have news fragments in changelog.d.

@hynek
Copy link
Member

hynek commented Mar 4, 2019

I’m pretty sure this breaks with subclassing?

I’ve had the idea of “auto detecting” implementations and implicitly switching them off for a while and would love it for all features but so far nobody came up a with a solid implementation.

I guess what i’d Like is a get_class_dunders_without_super_class(cls). 🤔

There might be even an open PR where someone tried and didn’t quite follow thru.

@andhus
Copy link
Contributor Author

andhus commented Mar 4, 2019

What exactly do you mean "breaks with subclassing"? @hynek

I'd claim that everything works as expected. Note that this fix is not about an attrs feature at all, rather about giving the user control over the class when needed ..."The class belongs to the users" kind of thing ;)

There is no magic solution for custom __getstate__, __setstate__ and subclassing for regular python classes or for non-slotted attrs classes either. When you implement these methods you need to know what you're doing and deal with consequences in subclasses. Please see examples below.

It is a great feature that attrs auto generates these methods for slotted-classes so that they are picklable by default, but you should let the user do it themselves if they want/need to go down that path.

Regular classes:

import pickle
import attr


class MyClass(object):

    def __init__(self, a, b):
        self.a = a
        self.b = b

    def __getstate__(self):
        return self.a, self.b + "_replacement"

    def __setstate__(self, state):
        self.a, self.b = state
        self.b += "_reconstructed"


class BadSubClass(MyClass):
    # forgets to override __get/setstate__
    def __init__(self, a, b, c):
        super(BadSubClass, self).__init__(a, b)
        self.c = c


bsc = BadSubClass("a", "b", "c")
bsc_rec = pickle.loads(pickle.dumps(bsc))
try:
    print(bsc_rec.c)
except AttributeError as e:
    print(e)  # -> 'BadSubClass' object has no attribute 'c'


class FixedSubClass(BadSubClass):
    # proper overriding of __get/setstate__
    def __getstate__(self):
        return super(FixedSubClass, self).__getstate__() + tuple(self.c)

    def __setstate__(self, state):
        super(FixedSubClass, self).__setstate__(state[:2])
        self.c = state[-1]

fsc = FixedSubClass("a", "b", "c")
fsc_rec = pickle.loads(pickle.dumps(fsc))
print(fsc_rec.c)  # ->  c
print(fsc_rec.b)  # ->  b_replacement_reconstructed

Same example for (non-slotted) attrs classes:


@attr.s(slots=False)
class MyAttrsClass(object):
    a = attr.ib()
    b = attr.ib()

    def __getstate__(self):
        return self.a, self.b + "_replacement"

    def __setstate__(self, state):
        self.a, self.b = state
        self.b += "_reconstructed"


@attr.s(slots=False)
class BadAttrsSubClass(MyAttrsClass):
    # forgets to override __get/setstate__
    c = attr.ib()


basc = BadAttrsSubClass("a", "b", "c")
basc_rec = pickle.loads(pickle.dumps(basc))
try:
    print(basc_rec.c)
except AttributeError as e:
    print(e)  # -> 'BadAttrsSubClass' object has no attribute 'c'


@attr.s(slots=False)
class FixedAttrsSubClass(BadAttrsSubClass):
    # proper overriding of __get/setstate__
    def __getstate__(self):
        return super(FixedAttrsSubClass, self).__getstate__() + tuple(self.c)

    def __setstate__(self, state):
        super(FixedAttrsSubClass, self).__setstate__(state[:2])
        self.c = state[-1]


fasc = FixedAttrsSubClass("a", "b", "c")
fasc_rec = pickle.loads(pickle.dumps(fasc))
print(fasc_rec.c)  # ->  c
print(fasc_rec.b)  # ->  b_replacement_reconstructed

Why should not the same be possible for slotted-classes?

@andhus
Copy link
Contributor Author

andhus commented Mar 4, 2019

A nice attrs feature that could simplify the specific use case would be:

def make_picklable(b):
    return b + "_replacement"

def reconstruct(b_replacement):
    return b_replacement + "_reconstructed"

@attr.s(slots=False)
class NewAPIClass(object):
    a = attr.ib()
    b = attr.ib(before_pickle=make_picklable, after_pickle=reconstruct)

@attr.s(slots=False)
class NewAPISubClass(NewAPIClass):
    c = attr.ib()

new_api_sc = NewAPISubClass("a", "b", "c")
new_api_sc_rec = pickle.loads(pickle.dumps(new_api_sc))
print(new_api_sc_rec.c)  # ->  c
print(new_api_sc_rec.b)  # ->  b_replacement_reconstructed

But this is orthogonal to the fix in this PR, which would still be relevant I think?

(I have a PoC implementation of the example, and this is maybe along the lines of your work-in-progress API additions?)

Edit: this feature should of course work the same way for slotted classes.

@hynek
Copy link
Member

hynek commented Mar 6, 2019

To be clear: the thing you want is swell, and I want it for all methods!

But the use-case I mean is this:

@attr.s(slots=True, auto_attribs=True)
class Base:
    a: int

    def __getstate__(self):
        pass # does whatever

    def __setstate__(self, state):   
        pass # does whatever

@attr.s(slots=True, auto_attribs=True)
class C(Base):
    b: int

I'm fairly certain that your code is going to "detect" Base's methods while creating C?

@andhus
Copy link
Contributor Author

andhus commented Mar 7, 2019

Exactly; when creating C, the change in this PR will detect that it already has the methods implemented (by inheritance) and will not override these. The point I'm trying to make with my (rather lengthy) examples is that it shouldn't override them for C either.

The only reason (I can think of) for implementing the methods in Base is that something must be done about Base:s "state" (i.e. attributes) before pickling. This need won't have disappeared in C, right? That's what I mean with:

There is no magic solution for custom __getstate__, __setstate__ and subclassing for regular python classes or for non-slotted attrs classes either.

It's your responsibility to "continue the custom implementation path" once you've started it.

What would you say is the wanted behaviour for pickleing of C in your example, and why should it differ from how non-slotted attrs classes (or regular non-attrs classes) behaves?

@@ -529,3 +529,144 @@ class C(object):
w = weakref.ref(c)

assert c is w()


def _loads_dumps(instance):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO test all protocols

@andhus
Copy link
Contributor Author

andhus commented Mar 7, 2019

Actually, I was wrong about the behaviour of this PR:s fix in the previous comment; in fact, it didn't "detect" Base's methods while creating C?" (which I claim is wanted behaviour). The latest commit fixes this though.

I added several test cases (there was no previous coverage of these auto-generated methods), the following (subset) would fail on master (all in tests/test_slots.py):
test_custom_getstate_setstate_effective
test_subclass_custom_getstate_setstate_referencing_super
test_subclass_custom_getstate_setstate_no_override

If you get the time, please let me know which of these that shouldn't pass and why, and any new test case that fails with the fix in this PR. I'm sorry if I misunderstand your concern/objection!

Not sure I understand the "want it for all methods" part, but I think this can be viewed as an isolated current shortcoming - that can be fixed 😄

@hynek
Copy link
Member

hynek commented Mar 21, 2019

So I disagree on the subclass detection thing, because you don't want this stuff to fall apart, just because some base class of yours adds a getstate.

All that said, I think the best approach here to fix it for now is to follow our existing precedent: weakref_slot.

Wouldn't something like slots_pickleable=True help you without introducing any guessing?

@andhus
Copy link
Contributor Author

andhus commented Mar 22, 2019

Maybe we just disagree - but I still think that you’re missing the point:

“you don't want this stuff to fall apart, just because some base class of yours adds a getstate”

But again, slots=False classes and non-attrs classes “fall apart” precisely this way if the base class adds a getstate. Why don’t we override get/setstate for all attrs classes if this is the motivation!?

My tl;dr is this:

  • The only original motivation to add a get/setstate method for slots=True classes is that these methods must be there for a slotted class to pickle (with the older protocols).
  • Thus, If the class already has these methods implemented the motivation to add (-> override) these methods is gone.
  • If a base class implements get/setstate, there is (normally) a reason for it that you need to deal with in sub classes (the “no magic solution” point in my initial examples).
  • It is confusing and unnecessary that slots=True classes behave differently from slots=False (and regular non-attrs classes) with respect to this.

@hynek
Copy link
Member

hynek commented Mar 22, 2019

I'm starting to suspect that we don't disagree and I'm just confused.

The docs say:

Those methods are created for frozen slotted classes because they won’t pickle otherwise.

Isn't that entirely wrong and those methods are added always for slotted classes currently?

@andhus
Copy link
Contributor Author

andhus commented Mar 23, 2019

Well, maybe not wrong (strictly speaking) but certainly incomplete/confusing since yes; these methods are always added(/overridden) for slots=True classes (independent of frozen).

So let's clarify some things: it is the docs that are misleading here, frozen has nothing to do with this. It is true that __slots__ classes (also when created without attrs) must implement __get/setstate__ to pickle with protocol < 2. Otherwise you get the pretty informative error:

TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

So the current implementation is right in that it adds the methods only based on slots=True. It goes wrong when it overrides these methods when they are already implemented.

@andhus
Copy link
Contributor Author

andhus commented Mar 23, 2019

Also, to recap: my initial issue had nothing to do with subclassing, the problem was simply that my custom __get/setstate__ methods were overridden by attrs - without any subclassing involved. That discussion came up as a concern about the suggested solution in this PR. It was a relevant concern, as the initial fix did not do what I wanted/@hynek was worrying ;) But I hope that I have now clearly motivated that:

If a class, slotted or not, implements __getstate__/__setstate__, by inheritance or directly, there is no motivation for attrs to override these.

Agree?

That being said, this is the first time I'm poking around in the internals of attrs so I'm not trying to argue the quality of this PR - just want to reach consensus and understanding of what the wanted behaviour is.

@andhus
Copy link
Contributor Author

andhus commented Mar 23, 2019

This clarifies current behaviour: #521

@hynek
Copy link
Member

hynek commented Aug 20, 2019

I still disagree on the inheritance thing, because I just don't think that attrs should behave differently based on something the user can't see because it came from from some random base class they didn't even know about. So far, attrs never did anything like that (but read on ;))

I'm gonna give you things I plan to do and you tell me, if that solves your problems, okay?

  1. Short term (i.e. next release, I hope less than a month): slots_pickleable=True which will leave __setstate__ and __getstate__ alone.
  2. Medium term (I hope next 2 months, but maybe next release): @attr.s(auto_detect_own=True) that will detect pre-existing but non-inherited methods and leave them be.
  3. Long term (I hope this year): @attr.s(slots=YourPickleMethodsFactory) that will allow you to completely customize how those methods are created.

How does that sound to you?

@andhus
Copy link
Contributor Author

andhus commented Aug 20, 2019

I regret coming across as stubborn, but I don't see how we're getting closer to defining what the wanted behaviour is and why it is desired that it differs between non-attrs-classes and non-slotted-attrs-classes, on one side, vs. slotted-attrs-classes on the other. I just feel like repeating my previous arguments and don't see them being responded to in principle:

  • The only original motivation to add a get/setstate method for slots=True classes is that these methods must be there for a slotted class to pickle (with the older protocols).
  • Thus, If the class already has these methods implemented the motivation to add (-> override) these methods is gone.
  • If a base class implements get/setstate, there is (normally) a reason for it that you need to deal with in sub classes (the “no magic solution” point in my initial examples).
  • It is confusing and unnecessary that slots=True classes behave differently from slots=False (and regular non-attrs classes) with respect to this.

I really can't relate to your view of inheritance in:

I just don't think that attrs should behave differently based on something the user can't see because it came from from some random base class they didn't even know about.

I'm no fan of inheritance per se and think that it can often lead to unnecessary complexity, but who the h*!l goes off inheriting random base classes they don't even know about!?

To your question/suggestion: This is not a blocker in any way. In the very rare cases that I'd need a custom __get/setstate__ I can assign them to the class after declaration - now the docs are fixed so it is less confusing 👍. Re 1: sure this is less obscure than assigning methods after declaring the class. Re 2: seems to be in direct conflict with what I argue is the wanted behaviour. Re 3: would be another solution.

Again, sorry for being stubborn. I promise not to argue further without more elaborate counter-arguments ;)

@andhus andhus changed the title Add support custom __getstate__, __setstate__ for slotted classes Support custom __getstate__, __setstate__ for slotted classes Aug 20, 2019
@hynek
Copy link
Member

hynek commented Aug 21, 2019

Don't worry about stubbornness, I very much appreciate you taking the time dancing with me in circles. :)

Let's start simple:

I'm no fan of inheritance per se and think that it can often lead to unnecessary complexity, but who the h*!l goes off inheriting random base classes they don't even know about!?

The answer is frameworks and third party libraries.

Number 2 shouldn't be in direct conflict with what you want, it just doesn't go far enough. Eventually, what you're supposed to be able to write is just:

@attrs.define
class C:
    x: int

    def __setstate__(self, state):
       ...

    def __getstate__(self):
       ...

It is confusing and unnecessary that slots=True classes behave differently from slots=False (and regular non-attrs classes) with respect to this.

I mean I agree; the only reason we're doing that is because it's impossible to add more stuff to a slotted class. I wonder if we shouldn't consider to write __[gs]etstate__ for all types of classes. 🤔


The ultimate question and the core conflict with our approaches is: why do you think __[gs]etstate__ special enough to make it behave differently than literally everything else in attrs?

I find especially pickle methods unfit for it, because I'd assume that some class you're subclasing might write some pickle method for itself and the moment you try to expand it by subclassing, those methods stop being adequate. Or am I missing anything here?

@andhus
Copy link
Contributor Author

andhus commented Aug 21, 2019

Thanks for a more elaborate response ;)

The answer is frameworks and third party libraries.

I see your concern (I intentionally interpreted your formulation rather literal). But if you are on a mission to protect child classes from their base ditto, then that should go for non-slotted as well. So this argument is only consistent if you consider doing it for all attrs classes - which I see you might do...

the only reason we're doing that is because it's impossible to add more stuff to a slotted class

Not sure I follow here, isn't the only(/original) reason __[gs]etstate__ is autogenerated for slotted classes that they are not pickleable otherwise with the old protocols? Or does "doing that" refer to something else?

Number 2 shouldn't be in direct conflict with what you want

No, I'm prepared to loosen my position here a bit ;) I think it is great with a solution that avoids overriding these methods when they are declared as part of the class definition in question, and fine (!) if it still does override if they are present by inheritance. Principally, I withstand that there is no need to differentiate between these cases, but in practice, it makes no difference:

  1. This case should be extremely rare (inheritance with slotted class + need to pickle...) inheritance was not at all part of my initial issue. This part of the discussion took off when you said the solution would "break" with subclassing where I claimed (and still do) that it does not break but behave as expected
  2. More importantly, this very part of my argumentation: "it is expected that you have to deal with the __[gs]etstate__ method in a child class which parent has a custom implementation" means in almost all of these (very rare) cases you will over-implement these in the declaration of the child class so it is enough that they are detected there 👍

Preferably this should be the default behaviour and not require some auto_detect_own=True because why would you ever declare them if you don't want them detected?

The ultimate question and the core conflict with our approaches is: why do you think [gs]etstate special enough to make it behave differently than literally everything else in attrs?

Well, they are already obviously special as of now: attrs only autogenerate/override them in slotted classes to support pickling of these with old protocols ...and this autogeneration/overriding is not needed if they already implement the methods. But sure, if the actual motivation is "protect from changes in base classes" and if it is applied to all classes, then they are no longer special.

As to the "core conflict with our approaches" - and to go further meta ;) - I think that you directly integrated my issue under the get_class_dunders_without_super_class(cls) ideas. I totally see the need for this 👍for other cases but specifically thought that this was not needed for this case.

I find especially pickle methods unfit for it [...]

Here you seem to make the same observation as I do in "2" above but drawing different conclusions. However, I think we agree that your second solution obtains good behaviour in practice.

@hynek
Copy link
Member

hynek commented Mar 16, 2020

Since I've finally merged #607 (in my defense, nobody wanted to review…), wouldn't it make sense to extend it to operate on __slots__-related methods?

@andhus
Copy link
Contributor Author

andhus commented Mar 16, 2020

Since I've finally merged #607 (in my defense, nobody wanted to review…), wouldn't it make sense to extend it to operate on __slots__-related methods?

Cool, I won't have time to look at it in-depth in the near future, but from the look of it yes 👍

hynek added a commit that referenced this pull request Apr 28, 2020
Pass *add_getstate_setstate* to attr.s to control the generation.  Mostly
interesting for disabling it in slotted classes.

Fixes #512,#513
hynek added a commit that referenced this pull request Apr 28, 2020
Pass *add_getstate_setstate* to attr.s to control the generation.  Mostly
interesting for disabling it in slotted classes.

Fixes #512,#513
hynek added a commit that referenced this pull request Apr 28, 2020
Pass *getstate_setstate* to attr.s to control the generation.  Mostly
interesting for disabling it in slotted classes.

Fixes #512,#513
hynek added a commit that referenced this pull request May 2, 2020
Pass *getstate_setstate* to attr.s to control the generation.  Mostly
interesting for disabling it in slotted classes.

Fixes #512,#513
hynek added a commit that referenced this pull request May 11, 2020
* Add control to generate __[sg]etstate__

Pass *getstate_setstate* to attr.s to control the generation.  Mostly
interesting for disabling it in slotted classes.

Fixes #512,#513

* Update glossary.rst
@hynek
Copy link
Member

hynek commented May 11, 2020

Fixed by #642

@hynek hynek closed this May 11, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants