Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bpo-45417: fix quadratic behaviour when creating an enum #28907

Merged
merged 3 commits into from
Oct 14, 2021

Conversation

cfbolz
Copy link
Contributor

@cfbolz cfbolz commented Oct 12, 2021

We got a bug report in PyPy that creating really big enums was slow, and worse, the performance was growing quadratically in the number of fields. We actually even found two places with quadratic behavior. Both are fixed in this PR:

  • in the _EnumDict class, the _member_names attribute was a list that was continually growing, and every new field was checked for existance using "in" in that list. Replace it with a dict (with None keys) to get both insertion order tracking as well as O(1) lookup.
  • When setting the name of a _proto_member, there was code that goes through all members to find something. Use the existing dictionary for that lookup if possible.

I am open to ideas how I would test that the slow behavior is fixed without relying on brittle timing checks.

(there is also a third place that behaves quadratically: When using auto(), the growing _EnumDict._last_values list is continually copied for every auto member. Removing that copy fixes the slowness (and tests pass), but I don't understand why it is there in the first place, so I am bit wary of touching it.)

https://bugs.python.org/issue45417

creating an enum class used to be quadratic in the number of fields, in
two different places: in the dicitonary implementation for the enum
class, and when setting the name of all members. fix the former by
switching to a dictionary instead of a list to store the fields
while building up the dictionary. fix the latter by trying to use the
already existing dictionary when looking up a field, instead of looping
of all fields.
Copy link
Member

@ethanfurman ethanfurman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One spelling error to fix, otherwise looks good.

The _last_values list is copied because it is sent to a (possibly) user-defined _generate_next_value_, and I don't want changes made to the list there to affect the enum's copy of that list.

Lib/enum.py Show resolved Hide resolved
@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@cfbolz
Copy link
Contributor Author

cfbolz commented Oct 12, 2021

@ethanfurman ah, thank you for the explanation, I figured it was something like that. Typo is fixed!

@ethanfurman
Copy link
Member

There are other list-based uses of _member_names_ as a class structure:

  • in __new__
  • in _simple_enum

Do you want to update those to use a dictionary or should I?

@cfbolz
Copy link
Contributor Author

cfbolz commented Oct 13, 2021

There are other list-based uses of _member_names_ as a class structure:

* in `__new__`

* in `_simple_enum`

Do you want to update those to use a dictionary or should I?

hm, I am a getting a bit confused 😅! I think _member_names_ (underscore at the end) does not need changing, only _EnumDict._member_names (without underscore at the end). _member_names_ remains a list, that is fine, there is no quadratic behaviour about that – it doesn't change after the enum type has been created and we don't look up anything in it either. Does that make sense?

The uses of _member_names (no underscore) in __new__ are fine, we pass the dict to a set() call which does the right thing, and we iterate over it, ditto. So no further changes needed imo.

@ethanfurman ethanfurman merged commit b2af211 into python:main Oct 14, 2021
@ethanfurman ethanfurman added the needs backport to 3.10 only security fixes label Oct 15, 2021
@miss-islington
Copy link
Contributor

Thanks @cfbolz for the PR, and @ethanfurman for merging it 🌮🎉.. I'm working now to backport this PR to: 3.10.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @cfbolz and @ethanfurman, I had trouble checking out the 3.10 backport branch.
Please backport using cherry_picker on command line.
cherry_picker b2af211e229df941d9b404f69547a264115156b5 3.10

@cfbolz cfbolz deleted the fix-issue-45417-enum-quadratic branch October 19, 2021 18:25
hroncok added a commit to hroncok/proto-plus-python that referenced this pull request Jun 28, 2022
There are two changes:

Changes in the actual code:

 - _member_names changed from a list to a dict in python/cpython#28907
    - we instance-check and remove by list-specific or dict-specific way

Change in the tests only:

 - accessing other enum members via instance attributes is no longer possible
    - we access them via the class instead
    - we leave the original test in a try-except block

Some of the Python enum changes might get reverted,
see python/cpython#93910
But the fix is backwards compatible.
hroncok added a commit to hroncok/proto-plus-python that referenced this pull request Jun 28, 2022
There are two changes:

Changes in the actual code:

 - _member_names changed from a list to a dict in python/cpython#28907
    - we instance-check and remove by list-specific or dict-specific way

Change in the tests only:

 - accessing other enum members via instance attributes is no longer possible
    - we access them via the class instead
    - we leave the original test in a try-except block

Some of the Python enum changes might get reverted,
see python/cpython#93910
But the fix is backwards compatible.

Fixes googleapis#326
parthea added a commit to googleapis/proto-plus-python that referenced this pull request Jan 5, 2023
* Adjust to enum changes in Python 3.11.0b3

There are two changes:

Changes in the actual code:

 - _member_names changed from a list to a dict in python/cpython#28907
    - we instance-check and remove by list-specific or dict-specific way

Change in the tests only:

 - accessing other enum members via instance attributes is no longer possible
    - we access them via the class instead
    - we leave the original test in a try-except block

Some of the Python enum changes might get reverted,
see python/cpython#93910
But the fix is backwards compatible.

Fixes #326

* ci: unit test session with python 3.11.0-beta.3

* ci: add python v3.11.0-beta.3 to noxfile.py

* another attempt to get python 3.11.0b3 working in github actions

* ci: use python 3.8 for docs check

* ci: fix docs build

* fix ci

* mark python 3.11 tests as required

* add python 3.11 to setup.py

* fix docs build

* remove python 3.11 test for unitcpp

* remove python 3.11 test for unitcpp

* remove python 3.11 test for unitcpp

* attempt to fix exclude in github action

Co-authored-by: Anthonios Partheniou <partheniou@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs backport to 3.10 only security fixes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants