Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compact layout of Plain Python Objects #72

Closed
markshannon opened this issue Jul 21, 2021 · 15 comments
Closed

Compact layout of Plain Python Objects #72

markshannon opened this issue Jul 21, 2021 · 15 comments
Labels
3.13 Things we intend to do for 3.13 epic-compact-objects Reducing size of objects for 3.12

Comments

@markshannon
Copy link
Member

Plain Python Objects are, at the Python level, just a thin wrapper around a dictionary.
However, they are generally used as objects with common behavior across all objects of a class.

Consider the much overused example of Color:

class Color:
    def __init__(self, r, g, b):
        self.red = r
        self.green = g
        self.blue = b

This object has 3 words of data, but for a header size of H takes a huge (H+1)+(H+4)+5 == 2H+10 words assuming shared keys. Without shared keys it takes an even worse (H+1)+(H+4)+13+5 == 2H+23 words.

We can make two simple observations.

  1. The overhead of maintaining the dictionary, even if it is never used directly, is large.
  2. If we cannot share keys, things get even worse.

Which suggests two broad strategies:

  1. Avoid creating the dictionary, if we don't have to.
  2. Design the key sharing mechanism to reduce the number of cases where we lose sharing. It might be worth over-allocating the shared keys and values array a bit to achieve this. The compiler should be able to help us here. All self.attr accesses in a class's methods are visible to the compiler.

Ideally we would like to use H+3 words of memory, but even H+5 (allowing 2 slots for internal use or over allocation) would halve memory use.

@gvanrossum
Copy link
Collaborator

I sense a connection to the idea of Hidden Classes (from the JavaScript world, esp. v8).

@markshannon
Copy link
Member Author

markshannon commented Jul 26, 2021

Hidden classes (originally known as "maps" Chambers, 92 sect 6.1.1) have a couple of important disadvantages.

  • The set of objects with the same set of attributes (a "clone family" in Chambers, 92) may not match the set of objects with the same class, requiring excessive specialization [1]
  • Hidden classes can form a very large graph, with no lifetimes constraints. This graph may become very large and is very difficult to garbage collect, as it is rooted on the empty hidden class.

[1] Suppose we want to specialize for attribute lookup on an instance. Suppose our prototype object belongs to set S containing all object with the same map (hidden class) and the set C containing all objects with the same actual class.
We will have to specialize for the set S ∩ C which may well be considerably smaller than either S or C.

@markshannon
Copy link
Member Author

There are two things we need to do to get close to the ideal layout.

  1. In the compiler, or possibly using runtime feedback, select a dictionary keys to be shared that will maximize the size of S ∩ C, ideally making S ∩ C ≈ C.
  2. Select an object layout that minimizes the space used for those object that are in the set S ∩ C and that degrades gracefully for other objects.

Where S in the set of all objects with the same dictionary keys, and C is the set of all objects with the same class.

Note that an object's dictionary keys is not the same as the set of keys present in the dictionary. E.g. the dict {a:1, b:1} can be represented as

a → 1
b → 2

or as

a → 1
b → 2
c → ∅

The later form may be more efficient if it allows the keys {'a', 'b', 'c'} to be shared across many dictionaries.

@markshannon
Copy link
Member Author

markshannon commented Jul 26, 2021

Current object layouts.

With shared keys

with shared keys

Without shared keys

without shared keys

The above diagram is slightly misleading, in that the keys and values are interleaved in the non-shared case. However this does not impact the amount of memory allocated.

Key

key

@markshannon
Copy link
Member Author

markshannon commented Jul 26, 2021

By putting the values directly after the __dict__ slot, we can get close to the minimum memory (just one extra field for the __dict__ pointer).

Compact layout

Compact layout

After accessing the __dict__

If code explicitly accesses the __dict__ of the object, we must create a dictionary. In which case the object would now look like this:

Ideal with dict

which is similar to the current object layout with shared keys, except that the values are allocated directly after the rest of the object.

After adding a new attribute

If a new attribute is added that isn't in the shared keys, then a new keys and values array must be created, much like it is in the current implementation. After adding the new attribute, the object would look like this:

Degenerate form

This wastes even more memory than the current approach. Hopefully it will be infrequent enough that we can save a substantial amount of memory overall.

@markshannon
Copy link
Member Author

markshannon commented Sep 15, 2021

Instead of using some wild guesses heuristics in the compiler to pre-compute the likely set of attributes of an object, we should do it at runtime.

First of all, a couple of observations:

  1. Over allocating the shared dict keys in the class is reasonably cheap, costing a few hundred bytes. Classes already take a few KB.
  2. Over allocating the values array for a few objects is also cheap.

The idea is to pre-allocate the shared dict keys in the class with capacity of 20 (or whatever is the capacity for an allocation of 32) and set an initial size to that same value. cls->values_size = 20

For each new object, after the first, shrink the size of the values until it reaches capacity(shared_dict_keys) + 1.
Even if the ultimate size is 1, we will only waste 171 slots (1.4kb).
We expect that by the time any object creation call is specialized, values_size will have reached its fixed point.
If not, we can just defer specialization.

Invariants that must be enforced for each object.

  • size(valuesarray) >= size(shared_dict_keys)
  • Values must not have gaps, or it violates dictionary ordering (unless we change the rules on attribute dicts)

@methane
Copy link

methane commented Sep 22, 2021

I expect this idea is not good at performance/complexity ratio.

How about starting with simpler approach?

image

  • values are not embedded in the instance.
    • We can allocate values with the size of shared dict keys.
  • __dict__ created on-demand. values are transferred to the dict.

@gvanrossum
Copy link
Collaborator

gvanrossum commented Sep 22, 2021 via email

@markshannon
Copy link
Member Author

As an intermediate step, @methane's suggestion makes sense.

However, in the longer term we will want the compact layout described above.
There isn't that much more complexity. Mainly a test in the dict, so it knows who owns the values array when deallocating.

See also #30 (comment) which will allow values to map from a subset of the keys.

@markshannon
Copy link
Member Author

The layout we now have (since python/cpython#28802) is quite good and we should probably spend our efforts on making sure that almost all objects use the compact layout and making sure that the C API supports compact layout, rather than trying to make the layout even more compact.

So, I'm deferring this issue for now.

I'm deferring this rather than closing it, as I still think we will want to embed the values into the object eventually. Although, that may not happen until 3.12 or even later.

@gramster gramster moved this to Todo in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Todo to Other in Fancy CPython Board Jan 10, 2022
@gramster gramster moved this from Other to Todo in Fancy CPython Board Jan 24, 2022
@mdboom mdboom added the epic-compact-objects Reducing size of objects for 3.12 label Feb 28, 2023
@markshannon
Copy link
Member Author

Definitely not 3.12, but maybe 3.13.

@carljm Would the Cinder team be interested in pursuing this?

@carljm
Copy link

carljm commented Mar 16, 2023

Would the Cinder team be interested in pursuing this?

In principle, yes -- I think it's now acquired permanent residency on our long-term roadmap of things to look at :)

In practice, it's not near the top of the short-term roadmap, and I think it's unlikely that we would prioritize it before we upgrade to 3.12 and can experiment with it relative to the already-done dict-values stuff, which probably means 2024 at best.

So maybe/yes, but not soon.

cc @mpage as someone who has generally been interested in this topic, and @swtaarrs as the person currently managing the Cinder perf roadmap

@benjamingr
Copy link

I just ran into this in "real life", note the idea in shapes/hidden classes is to learn the shape of objects which allows us to have both compact layout in practice (e.g. if I have a small integer member - it'll take just 32 bytes of memory rather than an object) and inline caches (the shape information is stored on the functions themselves which makes property access (like obj.x) behave like a memory offset "as fast as C".

This proposal doesn't actually address that?

@carljm
Copy link

carljm commented Dec 13, 2023

This proposal doesn't actually address that?

CPython 3.11+ has inline caching, and it already does make use of the learned object shape information to make attribute accesses fast. Not everything is described in a single discussion topic :) You may find it interesting to look at Python/bytecodes.c and Python/specialize.c.

At this point, this topic is specifically about the remaining step of moving the instance attribute values array inline into the object itself, rather than in a separate block of memory.

@markshannon
Copy link
Member Author

It only took 2 and a half years, but this is done 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3.13 Things we intend to do for 3.13 epic-compact-objects Reducing size of objects for 3.12
Projects
Development

No branches or pull requests

6 participants