Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-103193: cache calls to inspect._shadowed_dict in inspect.getattr_static #104267

Merged
merged 4 commits into from
May 7, 2023

Conversation

AlexWaygood
Copy link
Member

@AlexWaygood AlexWaygood commented May 7, 2023

EDIT: These performance numbers are out of date with the current PR; read further down the thread for the up-to-date numbers.

This dramatically speeds up calls to inspect.getattr_static.

Here are benchmark results on main, using @sobolevn's benchmark script from #103193 (comment):

type[Foo]                :   88 ±  0 ns
Foo                      :  158 ±  0 ns
type[Bar]                :   89 ±  0 ns
Bar                      :  158 ±  0 ns
WithParentClassX         :  224 ±  0 ns
Baz                      :  205 ±  0 ns
WithParentX              :  271 ±  1 ns
type[Missing]            :  252 ±  0 ns
Missing                  :  207 ±  0 ns
Slotted                  :  200 ±  1 ns
Method                   :  160 ±  1 ns
StMethod                 :  158 ±  0 ns
ClsMethod                :  158 ±  0 ns

And here are results with this PR:

type[Foo]                :   54 ±  0 ns
Foo                      :   85 ±  0 ns
type[Bar]                :   53 ±  0 ns
Bar                      :   85 ±  0 ns
WithParentClassX         :  102 ±  0 ns
Baz                      :   96 ±  0 ns
WithParentX              :  113 ±  0 ns
type[Missing]            :  110 ±  0 ns
Missing                  :   98 ±  0 ns
Slotted                  :  137 ±  0 ns
Method                   :   85 ±  0 ns
StMethod                 :   85 ±  0 ns
ClsMethod                :   85 ±  0 ns

With this PR, inspect.getattr_static is fast enough that even isinstance() calls like this, with "pathological" runtime-checkable protocols, are faster than they were on Python 3.11:

Pathological protocol
from typing import Protocol, runtime_checkable

@runtime_checkable
class Foo(Protocol):
    a: int
    b: int
    c: int
    d: int
    e: int
    f: int
    g: int
    h: int
    i: int
    j: int
    k: int
    l: int
    m: int
    n: int
    o: int
    p: int
    q: int
    r: int
    s: int
    t: int
    u: int
    v: int
    w: int
    x: int
    y: int
    z: int

class Bar:
    def __init__(self):
        for attrname in 'abcdefghijklmnopqrstuvwxyz':
            setattr(self, attrname, 42)

isinstance(Bar(), Foo)

This approach makes me a little nervous. There are two plausible reasons I thought of why adding a cache here might not be a good idea:

  1. It could cause references to the klass argument to be held by the cache even after they've been deleted elsewhere, which could be unexpected behaviour for a low-level function like getattr_static
  2. Perhaps it's possible that whether or not a __dict__ attribute is shadowed could change at some point during the lifetime of a class.

However, I think we're okay on both points.

For objection (1): _shadowed_dict is only ever called on type objects. The vast majority of classes are defined once in the global namespace and never deleted, so it shouldn't be an issue that the cache is holding strong references to the type objects. (If we do think this is an issue, I also experimented with a version of this PR that uses a weakref.WeakKeyDictionary as a cache. It also sped things up, but not by nearly as much.)

For objection (2): It doesn't seem to be possible, once a class has been created, to change the class's __dict__ attribute, at least from Python code. So for any given class klass, _shadowed_dict(klass) should always return the same result.

@carljm, what do you think?

@AlexWaygood AlexWaygood added the performance Performance or resource usage label May 7, 2023
@AlexWaygood AlexWaygood requested a review from carljm May 7, 2023 14:59
@carljm
Copy link
Member

carljm commented May 7, 2023

So for any given class klass, _shadowed_dict(klass) should always return the same result.

The __dict__ can't change, but the __mro__ can :/

>>> class A: pass
...
>>> class B: pass
...
>>> class C(A): pass
...
>>> C.__mro__
(<class '__main__.C'>, <class '__main__.A'>, <class 'object'>)
>>> C.__bases__ = (B,)
>>> C.__mro__
(<class '__main__.C'>, <class '__main__.B'>, <class 'object'>)

So I don't think this is safe.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 7, 2023

The __dict__ can't change, but the __mro__ can :/

Well, good that we're discovering holes in getattr_static test coverage :)

What about something like this, where _shadowed_dict is only cached for the lifetime of a single getattr_static call? It's slower than this patch currently, as it turns out that the call to .cache_clear() is pretty expensive. But it's still noticeably faster than main:

# add @lru_cache() to _shadowed_dict, like in this PR currently

def getattr_static(obj, attr, default=_sentinel):
    try:
        # existing getattr_static implementation here
    finally:
        _shadowed_dict.cache_clear()

@carljm
Copy link
Member

carljm commented May 7, 2023

How does the performance comparison look if you make _shadowed_dict take the mro tuple rather than the class? I.e. call sites would turn into _shadowed_dict(_static_getmro(klass)).

I think if you did that, then it would be safe to put an LRU cache on _shadowed_dict.

I think it should be a bounded cache, though, not unbounded.

The pathological scenario would be someone creating lots of transient classes inside a function, and calling getattr_static on instances of them. It seems prettu unlikely that someone is doing all of that, and expecting the transient classes to be GCed. But I think we at least should ensure the memory use in such a case is bounded.

@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 7, 2023

I think it should be a bounded cache, though, not unbounded.

@functools.lru_cache() creates a cache with size 128 by default, so my PR already proposes a bounded cache ;)

>>> functools.lru_cache(lambda: 1).cache_info()
CacheInfo(hits=0, misses=0, maxsize=128, currsize=0)

@AlexWaygood
Copy link
Member Author

How does the performance comparison look if you make _shadowed_dict take the mro tuple rather than the class? I.e. call sites would turn into _shadowed_dict(_static_getmro(klass)).

It's not as fast as this PR, but it's a lot faster than main. I'll switch to that approach.

Comment on lines +1809 to +1810
def _shadowed_dict(klass):
return _shadowed_dict_from_mro_tuple(_static_getmro(klass))
Copy link
Member Author

@AlexWaygood AlexWaygood May 7, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I considered inlining the calls to _static_getmro, but I thought this made it more readable. It also preserves git blame better, and having to pass through this function doesn't seem to have much impact on performance

@AlexWaygood AlexWaygood marked this pull request as ready for review May 7, 2023 16:22
@AlexWaygood
Copy link
Member Author

AlexWaygood commented May 7, 2023

Here's the new performance numbers using the getattr_static benchmark:

type[Foo]                :   61 ±  0 ns
Foo                      :  100 ±  0 ns
type[Bar]                :   61 ±  0 ns
Bar                      :  100 ±  0 ns
WithParentClassX         :  129 ±  0 ns
Baz                      :  118 ±  0 ns
WithParentX              :  146 ±  0 ns
type[Missing]            :  148 ±  0 ns
Missing                  :  126 ±  1 ns
Slotted                  :  162 ±  2 ns
Method                   :  105 ±  0 ns
StMethod                 :  106 ±  0 ns
ClsMethod                :  100 ±  0 ns

The same benchmark on main:

type[Foo]                :   90 ±  0 ns
Foo                      :  163 ±  0 ns
type[Bar]                :   90 ±  0 ns
Bar                      :  162 ±  0 ns
WithParentClassX         :  229 ±  0 ns
Baz                      :  209 ±  0 ns
WithParentX              :  276 ±  0 ns
type[Missing]            :  258 ±  0 ns
Missing                  :  213 ±  0 ns
Slotted                  :  203 ±  0 ns
Method                   :  162 ±  0 ns
StMethod                 :  163 ±  0 ns
ClsMethod                :  163 ±  0 ns

With this PR now, isinstance() checks against runtime-checkable protocols are faster than they were on 3.11 as long as the protocol has 13 members or fewer. On main, they are only faster than on 3.11 with 6 members or fewer.

On main, the results of the typing_runtime_protocols pyperformance benchmark are 262 us +- 11 us. With this PR, they are 178 us +- 9 us.

@AlexWaygood AlexWaygood added type-feature A feature request or enhancement stdlib Python modules in the Lib dir 3.12 bugs and security fixes labels May 7, 2023
@AlexWaygood
Copy link
Member Author

(Skipping news because, if this patch is correct, it shouldn't change behaviour at all other than improving performance. And the news entry in https://github.com/python/cpython/pull/103195/files should already cover the performance boost.)

@AlexWaygood AlexWaygood changed the title gh-103193: cache calls to inspect._shadowed_dict gh-103193: cache calls to inspect._shadowed_dict in inspect.getattr_static May 7, 2023
@AlexWaygood AlexWaygood merged commit 1b19bd1 into python:main May 7, 2023
@AlexWaygood AlexWaygood deleted the simple-lru-cache branch May 7, 2023 17:45
@AlexWaygood
Copy link
Member Author

Thanks @carljm :D

I listed you as a co-author on the commit; I don't think I would have thought of using the mro tuple as a cache key on my own :)

jbower-fb pushed a commit to jbower-fb/cpython-jbowerfb that referenced this pull request May 8, 2023
…getattr_static` (python#104267)

Co-authored-by: Carl Meyer <carl@oddbird.net>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.12 bugs and security fixes performance Performance or resource usage skip news stdlib Python modules in the Lib dir type-feature A feature request or enhancement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants