Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pull threading.local out of the generated repr's globals. #857

Merged
merged 1 commit into from
Nov 4, 2021

Conversation

thetorpedodog
Copy link
Contributor

@thetorpedodog thetorpedodog commented Nov 1, 2021

Generated __repr__ methods for Pythons with f-strings included
a threading.local object in that method's globals() dictionary.
Because cloudpickle attempts to serialize all globals of this method,
it ends up trying to pickle the threading.local, which cannot be
pickled.

Instead, we now pull use of the thread-local out into its own function,
which cloudpickle will happily serialize. As an added benefit, this
eliminates some duplicated code between f-string and non–f-string
__repr__s.

Should fix:

Pull Request Check List

  • Added tests for changed code.
    I'm not sure what the best direction to go on this is. I run this locally and see that when I cloudpickle an attrs class, it works fine. It would be pretty simple to add a test dep on cloudpickle and then create and pickle an attrs class in a new test case. But would that be too much of a special-case test case for this specific library? That said, it is more likely that changes here would cause problems in cloudpickling, rather than changes in cloudpickle causing problems here, and it would be nice to catch them faster.
  • (N/A) New features have been added to our Hypothesis testing strategy.
  • (N/A) Changes or additions to public APIs are reflected in our type stubs (files ending in .pyi).
  • (N/A) Updated documentation for changed code.
  • Documentation in .rst files is written using semantic newlines.
  • Changes (and possible deprecations) have news fragments in changelog.d.

Another implementation idea I had would be to simply pull the threading.local out into another module, because that would make the global a module (which would be pickled by reference) rather than a local (which would be pickled by value). That, however, would leave essentially a two-line import threading; repr_context = threading.local() module, which, ehhhhhhhh

src/attr/_make.py Outdated Show resolved Hide resolved
@Tinche
Copy link
Member

Tinche commented Nov 1, 2021

@thetorpedodog Could you show us the performance impact of this change on a sample class? I recommend pyperf.

@thetorpedodog
Copy link
Contributor Author

import textwrap

import attr
import pyperf

@attr.s
class Foo(object):
    bar = attr.ib()
    baz = attr.ib()
    boz = attr.ib()


runner = pyperf.Runner()
runner.timeit(
    name="basic repr",
    stmt="repr(x)",
    setup=textwrap.dedent("""
        import __main__
        x = __main__.Foo(1, "two", 3.0)
    """),
)

runner.timeit(
    name="recursion repr",
    stmt="repr(x)",
    setup=textwrap.dedent("""
        import __main__
        x = __main__.Foo(1, "two", [])
        x.boz.extend((x, x))
    """),
)

Benchmarks from Python 3.9.2 on Debian:

Before:

(venv) pfish@tiledebian:~/attrs$ python ./microbench.py 
.....................
basic repr: Mean +- std dev: 674 ns +- 10 ns
.....................
recursion repr: Mean +- std dev: 1.02 us +- 0.02 us

After:

(venv) pfish@tiledebian:~/attrs$ python ./microbench.py 
.....................
basic repr: Mean +- std dev: 673 ns +- 12 ns
.....................
recursion repr: Mean +- std dev: 1.02 us +- 0.02 us

Obviously this means it is 1 nsec (± 22 nsec) faster, a clear and meaningful performance improvement that is not at all probably just noise :)

@thetorpedodog
Copy link
Contributor Author

(just did a rebase to linearize history across this and #854)

@Tinche
Copy link
Member

Tinche commented Nov 1, 2021

@thetorpedodog Interesting, because that's not what I'm seeing at all ;)

main branch, now:

> pyperf timeit -g -s "from attr import make_class, attrib; A = make_class('A', ['a', 'b', 'c', 'd', 'e']); a = A(1, 2, 3, 4, 5)" "repr(a)"
.....................
 916 ns: 1 #########
 923 ns: 1 #########
 930 ns: 2 ##################
 937 ns: 2 ##################
 944 ns: 6 #####################################################
 951 ns: 6 #####################################################
 958 ns: 7 #############################################################
 965 ns: 5 ############################################
 972 ns: 9 ###############################################################################
 979 ns: 6 #####################################################
 986 ns: 7 #############################################################
 993 ns: 3 ##########################
 999 ns: 3 ##########################
1.01 us: 0 |
1.01 us: 1 #########
1.02 us: 0 |
1.03 us: 0 |
1.03 us: 0 |
1.04 us: 0 |
1.05 us: 0 |
1.05 us: 1 #########

Mean +- std dev: 970 ns +- 23 ns

Your branch:

> pyperf timeit -g -s "from attr import make_class, attrib; A = make_class('A', ['a', 'b', 'c', 'd', 'e']); a = A(1, 2, 3, 4, 5)" "repr(a)"
.....................
2.27 us: 1 ##########
2.28 us: 1 ##########
2.29 us: 0 |
2.30 us: 1 ##########
2.31 us: 1 ##########
2.32 us: 1 ##########
2.33 us: 2 ####################
2.34 us: 3 ##############################
2.35 us: 2 ####################
2.36 us: 5 #################################################
2.37 us: 5 #################################################
2.38 us: 8 ###############################################################################
2.39 us: 5 #################################################
2.41 us: 7 #####################################################################
2.42 us: 4 ########################################
2.43 us: 3 ##############################
2.44 us: 3 ##############################
2.45 us: 4 ########################################
2.46 us: 1 ##########
2.47 us: 2 ####################
2.48 us: 1 ##########

Mean +- std dev: 2.39 us +- 0.05 us

(CPython 3.9.7 on Ubuntu.)

That's a very significant slowdown. My gut feeling was there was no chance the proposed code would be as fast as the code that's in there now since not only are functions calls kinda expensive in CPython, you're using a context manager too.

@Tinche
Copy link
Member

Tinche commented Nov 1, 2021

Your script shows the same slowdown btw. I suspect you've forgotten to switch the branch between runs. Sorry :)

@hynek
Copy link
Member

hynek commented Nov 2, 2021

TBH I wouldn't consider performance of repr a top priority. Isn't it usually used only in interactive use?

My gut feeling is that cloudpickle interaction is a bigger problem because it's widely used in the data science sphere. 🤔

@Tinche
Copy link
Member

Tinche commented Nov 2, 2021

I repr on every request, to log the request description. Why should I pay the price for a library I've never used? shrug

@hynek
Copy link
Member

hynek commented Nov 2, 2021

Would it help you if we made it possible to switch off cycle detection for good? I guess that would make it even fast for you? Switching off would have to be opt-in tho.

@Tinche
Copy link
Member

Tinche commented Nov 2, 2021

Yeah it'd help but I'm not a huge fan of having messy decorators everywhere.

That said, I don't understand the exact problem here. Cloudpickle doesn't like threadlocals in the globals of __repr__, but having __repr__ call a function that does have a threadlocal in globals is fine for some reason?

@thetorpedodog
Copy link
Contributor Author

I repr on every request, to log the request description. Why should I pay the price for a library I've never used? shrug

I get the concern here, but if there were code where repr was the hottest path and the thing most limiting performance, that would be a sign of rather deeper architectural problems. My feeling about having a special switch in the decorator for this is that it would be an excessively special case that would be useful only in a vanishingly small number of cases, and at that point the author would probably want to hand-write the Python themselves.

That said, I don't understand the exact problem here. Cloudpickle doesn't like threadlocals in the globals of __repr__, but having __repr__ call a function that does have a threadlocal in globals is fine for some reason?

It’s, yeah, a bit confusing. I think the deal is that cloudpickle will try to pickle everything in a class instance itself by value, but once it goes outside that class, it will pickle by reference. Or something like that.

The benefit of this is that we can limit the performance impact of this change by restructuring it: simply pull the threadlocal into another module, then reference only that module in the __repr__’s globals. (I used the _compat module, because, uhhhhhh, it’s for compatibility with cloudpickle? Does that make sense? Any better ideas?) That change is sitting on another branch: thetorpedodog@0d710b4

For comparison (and this time I didn’t screw it up):

Old:

(a-completely-different-venv) pfish@tiledebian:~/a-completely-different-folder$ python ./microbench.py
.....................
basic repr: Mean +- std dev: 654 ns +- 16 ns
.....................
recursion repr: Mean +- std dev: 988 ns +- 23 ns

Thread-local in a different module (0d710b4):

(venv) pfish@tiledebian:~/attrs$ python ./microbench.py
.....................
basic repr: Mean +- std dev: 682 ns +- 16 ns
.....................
recursion repr: Mean +- std dev: 1.05 us +- 0.02 us

Context-manager (2063781):

(venv) pfish@tiledebian:~/attrs$ python ./microbench.py
.....................
basic repr: Mean +- std dev: 1.67 us +- 0.05 us
.....................
recursion repr: Mean +- std dev: 3.77 us +- 0.07 us

I think that cost is low enough to merit not worrying about given that it fixes real user issues.

@Tinche
Copy link
Member

Tinche commented Nov 2, 2021

That does look much better. Can we add a unit test for this as well (adding cloudpickle to the dev dependencies)?

@pganssle pganssle closed this Nov 2, 2021
@pganssle pganssle reopened this Nov 2, 2021
@pganssle
Copy link
Member

pganssle commented Nov 2, 2021

Sorry I pressed that button by accident.

@thetorpedodog
Copy link
Contributor Author

Switched this branch over to point at the version that pulls the thread-local into _compat. The previous version (context manager) is still available on my manage-the-context branch.

Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great, thanks for diving into it!

Please address the commens/suggestions and
add a news fragment: https://github.com/python-attrs/attrs/blob/main/.github/CONTRIBUTING.rst#changelog

src/attr/_make.py Show resolved Hide resolved
tests/test_compatibility.py Show resolved Hide resolved
@Tinche
Copy link
Member

Tinche commented Nov 3, 2021

LGTM!

Copy link
Contributor Author

@thetorpedodog thetorpedodog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add a news fragment

I could have sworn I had one! I must have lost it when editing history.

src/attr/_make.py Show resolved Hide resolved
tests/test_compatibility.py Show resolved Hide resolved
@thetorpedodog
Copy link
Contributor Author

Squashed everything into one change so that we have one change with a nice cohesive commit message.

Since apparently you can't just compare two arbitrary versions within GitHub, this reproduces the changes relative to 035280e:

thetorpedodog@1bca768

  • Undoes importing _compat module only.
  • Replaces lines.append(...)s with just making the array in one go.
  • Since the thread-local has been renamed repr_context, I used the already_repring name for the attribute on the thread-local.
  • Adds changelog entry.
  • Updates test compat comment in setup.py.
  • Adds docstring to test_repr.

tests/test_dunders.py Outdated Show resolved Hide resolved
Because cloudpickle tries to pickle a function's globals, when it
pickled an attrs instance, it would try to pickle the `__repr__` method
and its globals, which included a `threading.local`. This broke
cloudpickle for all attrs classes unless they explicitly specified
`repr=False`. Modules, however, are pickled by reference, not by value,
so moving the repr into a different module means we can put `_compat`
into the function's globals and not worry about direct references.
Includes a test to ensure that attrs and cloudpickle remain compatible.

Also adds an explanation of the reason we even *have* that global
thread-local variable.  It wasn't completely obvious to a reader why
the thread-local was needed to track reference cycles in `__repr__`
calls, and the test did not previously contain a cycle that touched
a non-attrs value. This change adds a comment explaining the need
and tests a cycle that contains non-attrs values.

Fixes:
- python-attrs#458
- cloudpipe/cloudpickle#320
@hynek hynek enabled auto-merge (squash) November 4, 2021 05:51
Copy link
Member

@hynek hynek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thank you!

@hynek hynek merged commit 554d6f2 into python-attrs:main Nov 4, 2021
@hynek
Copy link
Member

hynek commented Nov 4, 2021

(JFTR, the news fragments don't need the link to the PR, it's done automatically from the fiile name, so I've edited it slightly: 11c66ef didn't want to pester you with another review round. also off by 10: d884af5 😂)

@thetorpedodog
Copy link
Contributor Author

also off by 10: d884af5 😂

something like this was inevitable after I submitted #860

@hynek
Copy link
Member

hynek commented Nov 4, 2021

Fate doesn't like to be tempted! ¯\_(ツ)_/¯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants