Potential issue with `*.pyc` files #589

AdrianSosic · 2024-10-08T12:30:39Z

cattrs version: 24.1.0
Python version: 3.10
Operating System: macOS

Description

Hi, I'm not yet 100% certain, but I might have discovered a cattrs bug related to the *.pyc created when executing code. Not sure if this is anyhow related to the caching/compilation mechanism of Converter, and I also haven't been able to create a minimum example yet because my setting is quite a bit involved. But I thought perhaps we can already shed some light on it when I report what I see.

What I Did

In the past, we've already had several CI fails caused by cattrs showing some rather surprising/inconsistent behavior that we were not able to properly reproduce. The two most common types of errors were of the following kind:

Objects are no longer consistent after roundtrip serialization:
The hooks suddenly fail because of inconsistent settings:

The interesting part here is not the errors themselves but rather that they arise more or less sporadically, and mostly in CI and not locally. For instance, as the first screenshot shows, the error happened to occur on Python 3.10 but not on 3.12 in this particular case. And I haven't figured out a clear pattern for this. My only conjecture is that it might have to do with cached/compiled files that are sometimes present and sometimes not.

Now the good news, which seem to confirm this conjecture: I was just able to consistently reproduce the second error on my local machine. The interesting piece: as soon as I delete my *.pyc files, the code runs without problems once. After the execution, the *.pyc files are recreated and the test fails again. Same behavior of course if I use PYTHONDONTWRITEBYTECODE. So this seems to indicate that there is some unexpected interaction between the created byte code and cattrs.

Any idea what could be the root cause or how we could nail down the problem furhter?

The text was updated successfully, but these errors were encountered:

Tinche · 2024-10-08T14:22:06Z

Huh, interesting. I don't think code generation that cattrs does goes into a pyc file, my impression is that it's purely in memory. Would be amazing if you could put together a simple repro case so we can get to the bottom of the issue.

attrs and dataclasses do similar codegeneration at runtime (as do other libs like jinja2), and I've never had this issue myself. Super curious.

Tinche · 2024-10-08T14:25:37Z

You say you get problems mostly in CI, but folks don't commonly commit pyc files, right? So CI runs should be "clean"?

AdrianSosic · 2024-10-08T14:29:44Z

You say you get problems mostly in CI, but folks don't commonly commit pyc files, right? So CI runs should be "clean"?

I know, I'm also surprised. We definitely have no pyc files in our git repo, but I also have to say I don't fully understand how exactly the Github runners do their thing. That is, no clue what parts of environments / containers are reused when and when not. (Because of these issues, we've also deactivated environment caching a while ago which first seemed to have solved the problem but then they came back 🙈 )

Will try and see if I can somehow get a reproducing minimal example

AdrianSosic · 2024-10-09T09:38:25Z

Hey @Tinche, the weirdness doesn't stop. I was trying to create a minimal example for you by deleting all unnecessary parts from our codebase (had to go top-down because couldn't get it running bottom-up). And now I'm at the weirdest Python behavior I've ever observed, which blocks me from reducing the example any further:

I have deleted ~99% from our code base and have a reproducible setting that creates the problem – a colleague of mine (@Scienfitz) confirmed on his machine. Surprisingly, whatever I do from here onwards, even deleting "dead" code that is nowhere used, changes the effect. Perhaps the problem appears only if the byte code exceeds a certain size 🤔 !? In any case, I simply can't get the example any smaller. However, if you are willing to have a look, you can reproduce the issue on our repo using a few lines of simple instructions. I'd be super curious to hear your thoughts and see if you are also as surprised as we are.

Instructions

Installation

Clone repo and checkout branch

git clone https://github.com/emdgroup/baybe.git
cd baybe
git checkout cattrs_bug

Create a virtual Python 3.10 environment. We've tested both pyenv and mamba for environment creation under 3.10 and 3.12. With 3.10, you get the behavior described below. With 3.12, the tests always fail!

Install the code

pip install uv
uv pip install -e ".[dev]"

Error reproduction

Run the test and see that it passes
```
pytest
```
Run it again and be surprised that it fails, also for successive runs

Delete the compiled files

find . | grep -E "(/__pycache__$|\.pyc$|\.pyo$)" | xargs rm -rf

Run it once again and be even more surprised that it passes again

Tinche · 2024-10-09T09:44:00Z

Interesting. I'll be sure to take a look since I'm very curious myself.

Does the problem go away if pytest isn't used? I know pytest does some black magic rewriting bytecode for the assert statement to work.

AdrianSosic · 2024-10-09T10:41:34Z

As far as I can tell right now, without pytest the behavior is consistent, regardless of whether the bytecode files are present or not

Tinche · 2024-10-09T10:53:58Z

Alright, I think I got a simple reproducer on 3.12 with this:

from baybe.surrogates.base import Surrogate
from baybe.surrogates.random_forest import RandomForestSurrogate


surrogate = RandomForestSurrogate()
string = surrogate.to_json()
Surrogate.from_json(string)

But the issue here is - a ton of fields in Surrogate are init=False. Cattrs will skip these fields by default. That's an easy fix - just pass _cattrs_include_init_false to make_dict_structure_fn.

This didn't fix the problem. The true fix is to change find_subclass to:

from gc import collect

def find_subclass(base: type, name_or_abbr: str, /):
    """Retrieve a specific subclass of a base class via its name or abbreviation."""
    collect()
    try:
        return next(cl for cl in get_subclasses(base) if refers_to(cl, name_or_abbr))
    except StopIteration:
        raise UnidentifiedSubclassError(
            f"The class name or abbreviation '{name_or_abbr}' does not refer to any "
            f"of the subclasses of '{base.__name__}'."
        )

When you use @define, attrs will add slots to your class. Slots can't be added to classes that already exist, so attrs will actually create a whole new class as a clone of the original, see https://www.attrs.org/en/stable/glossary.html#term-slotted-classes for more details.

We try very hard to make this work transparently, but sometimes it doesn't. One place is cls.__subclasses__ - unfortunately, the old class remains in this list until a garbage collection run. cattrs was generating a structuring function for the old class, not the new one, which was causing the errors.

This is also why the problem was sporadic - if a gc run happens before, the problem solves itself.

AdrianSosic · 2024-10-09T10:58:17Z

Oh wow, that was REALLY fast 😳 That also perfectly explains the behavior I was just about to report to you:

When I just run the code from the test body without pytest (i.e. in a normal script), I get the error.
When I run it with the debugger, I also get an error.
However, when I place a breakpoint somewhere and then just continue without doing anything else, things work!

I think you might just have saved us days of work, honestly!!! Let my have a proper look if I can confirm the behavior. Will let you know!

AdrianSosic · 2024-10-09T13:30:44Z

Hey @Tinche, so it indeed seems to solve all problems when running stuff locally. I've also started the first tests in CI but I think in the end only time will show if everything's fixed. However, so far looking really good 🤩 Hence, let us consider the core of the problem solved for now 👏🏼 Many thanks again, your help is much appreciated!!

One other related thing: while the gc.collect inside the function works, it makes our CI runs roughly 3x longer =/ The alternative where I do the garbage collection right after my class definitions (to avoid repeated calls at runtime) is much faster, but comes with the immense drawback that I've to add more than 40 such calls to our codebase + there is always the risk that we'll forget it in new files. Do you have any suggestion how to handle it more elegantly? Also, was there ever a discussion about triggering the garbage collection from within attrs itself?

Tinche · 2024-10-09T13:43:38Z

There was a little bit of discussion here: python-attrs/attrs#407

Feel free to open a new issue there and maybe we can figure out a better way. I wonder what's actually keeping the original class alive, and can we just break those references.

Fixes the issue discussed [here](python-attrs/cattrs#589).

Tinche · 2024-10-09T14:11:53Z

There are probably some other solutions, like if you can figure out a way to call gc.collect only once after all models are imported, or we can look into making get_subclasses smarter (if all your classes are slotted, you could probably filter out the non-slotted ones here).

AdrianSosic · 2024-10-11T09:03:23Z

I wonder what's actually keeping the original class alive, and can we just break those references.

Very good question, indeed. I can only say that this is not the first time I stumbled over it but I understand it's a very complicated matter.

There are probably some other solutions, like if you can figure out a way to call gc.collect only once after all models are imported, or we can look into making get_subclasses smarter (if all your classes are slotted, you could probably filter out the non-slotted ones here).

Unfortunately not possible. Since we are building a package and not an application, there is no obvious "entry point" for the user and thus it's not clear what parts of the code will be loaded and when.

But at least we have a workaround for now, which already solves 99% of my problems 🙃 So let me thank you again and close the issue for now 👍🏼

Scienfitz mentioned this issue Oct 9, 2024

Turn off byte code generation in tests emdgroup/baybe#396

Closed

AdrianSosic mentioned this issue Oct 9, 2024

Subclass garbage collection emdgroup/baybe#397

Merged

AdrianSosic added a commit to emdgroup/baybe that referenced this issue Oct 9, 2024

Subclass garbage collection (#397)

c1373c0

Fixes the issue discussed [here](python-attrs/cattrs#589).

AdrianSosic closed this as completed Oct 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential issue with `*.pyc` files #589

Potential issue with `*.pyc` files #589

AdrianSosic commented Oct 8, 2024

Tinche commented Oct 8, 2024

Tinche commented Oct 8, 2024

AdrianSosic commented Oct 8, 2024

AdrianSosic commented Oct 9, 2024

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

Tinche commented Oct 9, 2024

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 11, 2024

Potential issue with *.pyc files #589

Potential issue with *.pyc files #589

Comments

AdrianSosic commented Oct 8, 2024

Description

What I Did

Tinche commented Oct 8, 2024

Tinche commented Oct 8, 2024

AdrianSosic commented Oct 8, 2024

AdrianSosic commented Oct 9, 2024

Instructions

Installation

Error reproduction

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

AdrianSosic commented Oct 9, 2024

Tinche commented Oct 9, 2024

Tinche commented Oct 9, 2024

AdrianSosic commented Oct 11, 2024

Potential issue with `*.pyc` files #589

Potential issue with `*.pyc` files #589