Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-128137: Update PyASCIIObject to handle interned field with the atomic operation #128196

Merged
merged 19 commits into from
Jan 5, 2025

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Dec 23, 2024

@corona10
Copy link
Member Author

@colesbury I am not sure what you intended.
To maintain the original size, I squeeze the state field from 32 bits to 16 bits and reserve 16 bits for the interned field.

Include/cpython/unicodeobject.h Outdated Show resolved Hide resolved
Include/cpython/unicodeobject.h Outdated Show resolved Hide resolved
@corona10 corona10 requested a review from colesbury December 23, 2024 15:54
@corona10
Copy link
Member Author

f30b355 breaks WASI test: https://github.com/python/cpython/actions/runs/12469477403/job/34802702354?pr=128196 (object size is increased).

3: Interned, Immortal, and Static
This categorization allows the runtime to determine the right
cleanup mechanism at runtime shutdown. */
uint8_t interned;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this remain in the state struct. It's okay for a struct to contain both non-bitfield and bitfield members:

  • It avoids a potential unnecessary breakage from moving the field
  • Keeping it in state will make it easier to keep state 32-bits due to alignment.

Include/cpython/unicodeobject.h Outdated Show resolved Hide resolved
@colesbury
Copy link
Contributor

I think this should have a NEWS entry given that we're changing the size of a semi-public field.

@corona10 corona10 changed the title gh-128137: Split out interned field from state [WIP] gh-128137: Split out interned field from state Dec 23, 2024
@corona10 corona10 changed the title [WIP] gh-128137: Split out interned field from state gh-128137: Split out interned field from state Dec 23, 2024
@corona10 corona10 changed the title gh-128137: Split out interned field from state gh-128137: Update PyASCIIObject layout to handle interned field with the atomic operation. Dec 23, 2024
@corona10 corona10 changed the title gh-128137: Update PyASCIIObject layout to handle interned field with the atomic operation. gh-128137: Update PyASCIIObject to handle interned field with the atomic operation Dec 23, 2024
@corona10 corona10 requested a review from colesbury December 23, 2024 16:46
@corona10
Copy link
Member Author

corona10 commented Dec 23, 2024

Since we touch the semi-public field, I am not sure about backporting this PR.
Let's just leave 3.13t as buggy status because it is just the experimental build.

@corona10 corona10 changed the title [WIP] gh-128137: Update PyASCIIObject to handle interned field with the atomic operation gh-128137: Update PyASCIIObject to handle interned field with the atomic operation Jan 1, 2025
@corona10
Copy link
Member Author

corona10 commented Jan 1, 2025

@colesbury

I use unsigned short, which is supported by specific compilers as the same as unsigned char, and I attach related links about it.
And I am fine with if we can cover PEP 11 platforms.

But if we want to forget about 4 bytes increasement issue at 32 bit platforms, and just use portable bit field declaration could be one possible option. Because we can just add specific unit test handling code for specific platforms but I don't prefer this option because it is very difficult to track platform specific issue in the future.

@corona10 corona10 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 1, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 30c2eb2 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 1, 2025
@corona10
Copy link
Member Author

corona10 commented Jan 2, 2025

Ah, okay, I re-read https://bugs.python.org/issue19537; unsigned short is not proper, so I revert to using unsigned char instead. It will be correct. (But I need to investigate Windows issue again)

@corona10 corona10 changed the title gh-128137: Update PyASCIIObject to handle interned field with the atomic operation [WIP] gh-128137: Update PyASCIIObject to handle interned field with the atomic operation Jan 2, 2025
@corona10 corona10 changed the title [WIP] gh-128137: Update PyASCIIObject to handle interned field with the atomic operation gh-128137: Update PyASCIIObject to handle interned field with the atomic operation Jan 2, 2025
@corona10 corona10 added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 2, 2025
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @corona10 for commit 91907aa 🤖

If you want to schedule another build, you need to add the 🔨 test-with-buildbots label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Jan 2, 2025
@corona10
Copy link
Member Author

corona10 commented Jan 2, 2025

I think that it would be fine

  • uint16_t: interned field = 16bits (Increase it to 16bits is little bit painful, but it would be nothing for modern CPUs)
  • unsigned short: kind + compact + ascii + statically_allocated + padding = 16bits (for 32bits and 64bits system both)
  • In total: 32 bits

@corona10
Copy link
Member Author

corona10 commented Jan 2, 2025

In Linux

  • sizeof PyASCIIObject is 40bytes
  • sizeof PyASCIIObject.state is 4 bytes.

So no regression will be.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM. @methane, would you please take a look at this as well?

@corona10 corona10 merged commit ae23a01 into python:main Jan 5, 2025
118 checks passed
@corona10 corona10 deleted the gh-128137 branch January 5, 2025 09:17
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 6, 2025
srinivasreddy pushed a commit to srinivasreddy/cpython that referenced this pull request Jan 8, 2025
clin1234 added a commit to clin1234/pyo3 that referenced this pull request Jan 21, 2025
Due to python/cpython#128196, data types within `PyASCIIObject.state` have changed, resulting in test failures when building against 3.14.
clin1234 added a commit to clin1234/pyo3 that referenced this pull request Jan 23, 2025
Due to python/cpython#128196, data types within `PyASCIIObject.state` have changed, resulting in test failures when building against 3.14.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants