Compact layout of Plain Python Objects #72

markshannon · 2021-07-21T14:48:12Z

Plain Python Objects are, at the Python level, just a thin wrapper around a dictionary.
However, they are generally used as objects with common behavior across all objects of a class.

Consider the much overused example of Color:

class Color:
    def __init__(self, r, g, b):
        self.red = r
        self.green = g
        self.blue = b

This object has 3 words of data, but for a header size of H takes a huge (H+1)+(H+4)+5 == 2H+10 words assuming shared keys. Without shared keys it takes an even worse (H+1)+(H+4)+13+5 == 2H+23 words.

We can make two simple observations.

The overhead of maintaining the dictionary, even if it is never used directly, is large.
If we cannot share keys, things get even worse.

Which suggests two broad strategies:

Avoid creating the dictionary, if we don't have to.
Design the key sharing mechanism to reduce the number of cases where we lose sharing. It might be worth over-allocating the shared keys and values array a bit to achieve this. The compiler should be able to help us here. All self.attr accesses in a class's methods are visible to the compiler.

Ideally we would like to use H+3 words of memory, but even H+5 (allowing 2 slots for internal use or over allocation) would halve memory use.

The text was updated successfully, but these errors were encountered:

gvanrossum · 2021-07-21T15:51:37Z

I sense a connection to the idea of Hidden Classes (from the JavaScript world, esp. v8).

markshannon · 2021-07-26T08:43:06Z

Hidden classes (originally known as "maps" Chambers, 92 sect 6.1.1) have a couple of important disadvantages.

The set of objects with the same set of attributes (a "clone family" in Chambers, 92) may not match the set of objects with the same class, requiring excessive specialization [1]
Hidden classes can form a very large graph, with no lifetimes constraints. This graph may become very large and is very difficult to garbage collect, as it is rooted on the empty hidden class.

[1] Suppose we want to specialize for attribute lookup on an instance. Suppose our prototype object belongs to set S containing all object with the same map (hidden class) and the set C containing all objects with the same actual class.
We will have to specialize for the set S ∩ C which may well be considerably smaller than either S or C.

markshannon · 2021-07-26T09:05:13Z

There are two things we need to do to get close to the ideal layout.

In the compiler, or possibly using runtime feedback, select a dictionary keys to be shared that will maximize the size of S ∩ C, ideally making S ∩ C ≈ C.
Select an object layout that minimizes the space used for those object that are in the set S ∩ C and that degrades gracefully for other objects.

Where S in the set of all objects with the same dictionary keys, and C is the set of all objects with the same class.

Note that an object's dictionary keys is not the same as the set of keys present in the dictionary. E.g. the dict {a:1, b:1} can be represented as

a → 1
b → 2

or as

a → 1
b → 2
c → ∅

The later form may be more efficient if it allows the keys {'a', 'b', 'c'} to be shared across many dictionaries.

markshannon · 2021-07-26T09:22:22Z

Current object layouts.

With shared keys

Without shared keys

The above diagram is slightly misleading, in that the keys and values are interleaved in the non-shared case. However this does not impact the amount of memory allocated.

Key

markshannon · 2021-07-26T15:19:47Z

By putting the values directly after the __dict__ slot, we can get close to the minimum memory (just one extra field for the __dict__ pointer).

Compact layout

After accessing the `dict`

If code explicitly accesses the __dict__ of the object, we must create a dictionary. In which case the object would now look like this:

which is similar to the current object layout with shared keys, except that the values are allocated directly after the rest of the object.

After adding a new attribute

If a new attribute is added that isn't in the shared keys, then a new keys and values array must be created, much like it is in the current implementation. After adding the new attribute, the object would look like this:

This wastes even more memory than the current approach. Hopefully it will be infrequent enough that we can save a substantial amount of memory overall.

markshannon · 2021-09-15T15:21:02Z

Instead of using some ~~wild guesses~~ heuristics in the compiler to pre-compute the likely set of attributes of an object, we should do it at runtime.

First of all, a couple of observations:

Over allocating the shared dict keys in the class is reasonably cheap, costing a few hundred bytes. Classes already take a few KB.
Over allocating the values array for a few objects is also cheap.

The idea is to pre-allocate the shared dict keys in the class with capacity of 20 (or whatever is the capacity for an allocation of 32) and set an initial size to that same value. cls->values_size = 20

For each new object, after the first, shrink the size of the values until it reaches capacity(shared_dict_keys) + 1.
Even if the ultimate size is 1, we will only waste 171 slots (1.4kb).
We expect that by the time any object creation call is specialized, values_size will have reached its fixed point.
If not, we can just defer specialization.

Invariants that must be enforced for each object.

size(valuesarray) >= size(shared_dict_keys)
Values must not have gaps, or it violates dictionary ordering (unless we change the rules on attribute dicts)

methane · 2021-09-22T04:16:12Z

I expect this idea is not good at performance/complexity ratio.

How about starting with simpler approach?

values are not embedded in the instance.
- We can allocate values with the size of shared dict keys.
__dict__ created on-demand. values are transferred to the dict.

gvanrossum · 2021-09-22T04:38:34Z

Mark can speak for himself, but to me the disadvantage of that is the need to allocate two blocks of memory instead of one for the values, and the extra pointer deref to get at the values in a specialized version of LOAD_ATTR. (And the fact that the guard in the specialized opcode would have to check that the values pointer is still there.) It's true that that approach is less complex to code, though.

markshannon · 2021-09-22T09:36:49Z

As an intermediate step, @methane's suggestion makes sense.

However, in the longer term we will want the compact layout described above.
There isn't that much more complexity. Mainly a test in the dict, so it knows who owns the values array when deallocating.

See also #30 (comment) which will allow values to map from a subset of the keys.

markshannon · 2021-12-02T11:22:35Z

The layout we now have (since python/cpython#28802) is quite good and we should probably spend our efforts on making sure that almost all objects use the compact layout and making sure that the C API supports compact layout, rather than trying to make the layout even more compact.

So, I'm deferring this issue for now.

I'm deferring this rather than closing it, as I still think we will want to embed the values into the object eventually. Although, that may not happen until 3.12 or even later.

markshannon · 2023-03-16T16:38:38Z

Definitely not 3.12, but maybe 3.13.

@carljm Would the Cinder team be interested in pursuing this?

carljm · 2023-03-16T18:17:11Z

Would the Cinder team be interested in pursuing this?

In principle, yes -- I think it's now acquired permanent residency on our long-term roadmap of things to look at :)

In practice, it's not near the top of the short-term roadmap, and I think it's unlikely that we would prioritize it before we upgrade to 3.12 and can experiment with it relative to the already-done dict-values stuff, which probably means 2024 at best.

So maybe/yes, but not soon.

cc @mpage as someone who has generally been interested in this topic, and @swtaarrs as the person currently managing the Cinder perf roadmap

benjamingr · 2023-12-02T16:49:19Z

I just ran into this in "real life", note the idea in shapes/hidden classes is to learn the shape of objects which allows us to have both compact layout in practice (e.g. if I have a small integer member - it'll take just 32 bytes of memory rather than an object) and inline caches (the shape information is stored on the functions themselves which makes property access (like obj.x) behave like a memory offset "as fast as C".

This proposal doesn't actually address that?

carljm · 2023-12-13T01:13:13Z

This proposal doesn't actually address that?

CPython 3.11+ has inline caching, and it already does make use of the learned object shape information to make attribute accesses fast. Not everything is described in a single discussion topic :) You may find it interesting to look at Python/bytecodes.c and Python/specialize.c.

At this point, this topic is specifically about the remaining step of moving the instance attribute values array inline into the object itself, rather than in a separate block of memory.

markshannon · 2024-05-07T09:06:34Z

It only took 2 and a half years, but this is done 🎉

markshannon mentioned this issue Jul 21, 2021

Object layout #69

Closed

gvanrossum mentioned this issue Jul 21, 2021

Hidden classes #6

Closed

This was referenced Jul 26, 2021

Dictionary modifications to enable optimizations. #30

Closed

Changes in the compiler and dictionary to make key sharing in classes more effective. #77

Closed

This was referenced Sep 30, 2021

Regular object layout #80

Closed

bpo-40116: Add insertion order bit-vector to dict values to allow dicts to share keys more freely. python/cpython#28520

Merged

bpo-45340: Don't create object dictionaries unless actually needed python/cpython#28802

Merged

markshannon added the deferred label Dec 2, 2021

gramster added this to Fancy CPython Board Jan 10, 2022

gramster moved this to Todo in Fancy CPython Board Jan 10, 2022

gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022

gramster moved this from Other to Todo in Fancy CPython Board Jan 24, 2022

mdboom added the epic-compact-objects Reducing size of objects for 3.12 label Feb 28, 2023

markshannon mentioned this issue Feb 7, 2024

Better handling of accessing an object's __dict__ attribute. #651

Open

markshannon added 3.13 Things we intend to do for 3.13 and removed deferred labels Feb 7, 2024

markshannon mentioned this issue Feb 13, 2024

Things to do for 3.13 #654

Open

9 tasks

markshannon mentioned this issue Feb 21, 2024

Inline values array into the object python/cpython#115776

Open

markshannon closed this as completed May 7, 2024

github-project-automation bot moved this from Todo to Done in Fancy CPython Board May 7, 2024

samtygier-stfc mentioned this issue Jul 23, 2024

Show attribute names in leak tracker mantidproject/mantidimaging#2291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compact layout of Plain Python Objects #72

Compact layout of Plain Python Objects #72

markshannon commented Jul 21, 2021

gvanrossum commented Jul 21, 2021

markshannon commented Jul 26, 2021 •

edited

Loading

markshannon commented Jul 26, 2021

markshannon commented Jul 26, 2021 •

edited

Loading

markshannon commented Jul 26, 2021 •

edited

Loading

markshannon commented Sep 15, 2021 •

edited

Loading

methane commented Sep 22, 2021

gvanrossum commented Sep 22, 2021 via email

markshannon commented Sep 22, 2021

markshannon commented Dec 2, 2021

markshannon commented Mar 16, 2023

carljm commented Mar 16, 2023

benjamingr commented Dec 2, 2023

carljm commented Dec 13, 2023

markshannon commented May 7, 2024

Compact layout of Plain Python Objects #72

Compact layout of Plain Python Objects #72

Comments

markshannon commented Jul 21, 2021

gvanrossum commented Jul 21, 2021

markshannon commented Jul 26, 2021 • edited Loading

markshannon commented Jul 26, 2021

markshannon commented Jul 26, 2021 • edited Loading

Current object layouts.

With shared keys

Without shared keys

Key

markshannon commented Jul 26, 2021 • edited Loading

Compact layout

After accessing the __dict__

After adding a new attribute

markshannon commented Sep 15, 2021 • edited Loading

Invariants that must be enforced for each object.

methane commented Sep 22, 2021

gvanrossum commented Sep 22, 2021 via email

markshannon commented Sep 22, 2021

markshannon commented Dec 2, 2021

markshannon commented Mar 16, 2023

carljm commented Mar 16, 2023

benjamingr commented Dec 2, 2023

carljm commented Dec 13, 2023

markshannon commented May 7, 2024

markshannon commented Jul 26, 2021 •

edited

Loading

markshannon commented Jul 26, 2021 •

edited

Loading

markshannon commented Jul 26, 2021 •

edited

Loading

After accessing the `dict`

markshannon commented Sep 15, 2021 •

edited

Loading