-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FR: Change int
repr on huge values to automatically use hexadecimal
#96601
Comments
I assigned it to @mdickinson for thoughts, not necessarily to do it. This is potentially disruptive so it warrants discussion on Discuss first if we want to go forward with some version of this, if not ultimately a Steering Council decision if there is disagreement. The unittests and documentation are more work than the implementation for the easy fixed transition point version. The exact decimal threshold changeover point would be more work, it could involve keeping a PyLong around of the exact value to compare value > reference against for the hex transition point. Doing that is probably not worthwhile, just a rough estimate comparing size to the |
Some thoughts:
Having offered those thoughts, I'll unassign myself. But I'd dearly love to hear from others, and especially those with close ties to |
Worth noting that the fact that the |
I think that it is better to get exception right here, than produce an unexpected output which would break the same or other program when it tries to parse it. Many years ago I considered the idea of using hexadecimals for long integers in pickle protocol 0. Not because potential DOS (pickles are already more unsafe), but for performance. But it would break compatibility with older Python versions and interoperability with other programs (there are implementations of pickle protocol 0 in non-Python programming languages). The only reason of using protocol 0 is compatibility. |
Given those last two comments from Mark and Serhiy, I'm closing this one as "Infeasible" as reprs wind up in all sorts of places beyond people's control so doing this could just further surprises at a place removed from the code that produced the surprising data. |
Problem
Now that 95778 is in, the
repr
of anint
can fail with a ValueError based on its size becauserepr
andstr
are the same forint
thus huge values cannot have a repr.We discussed this while working on that security fix but deemed that changing a repr was way beyond reason for a patch release bugfix. Raising the ValueError exception highlights the point in the code that potentially needs specific attention rather than allowing a new unexpected format of data to start showing up where it hadn't previously as a result of a patch release.
Enhancement Proposal
We could fix this annoyance if we are willing to change
int
's repr. For huge values we could automatically repr them as hexadecimal.str
behavior would not change.The auto-hex repr point needs to be at less bits than required to represent a
sys.int_info.str_digits_check_threshold
decimal digit value so that there exists no scenario in whichrepr
of anint
could fail.Perhaps all integers >512 bits (to pick an arbitrary nice threshold) could repr to hexadecimal:
Effectively this behavior:
Potential wins
repr
anint
other than a MemoryError.int
when it is huge. Notebook users for example would see the result of their hugeint
computation instead of a ValueError. It'd just be in hex. (REPLs emit the repr)On the other hand, I expect notebooks may choose to implement this in their own REPL repr code long before it is released into a CPython version that they're run on top of.
int
and implement their own specialized repr when they always want a value.Potential disruption
repr
expecting to always get a decimal value. Bug in user code: Should usestr
.If we didn't choose a low limit, but instead tied the switch over point to the largest binary value that fits within
sys.get_int_str_max_digits()
decimal digits we'd be inconsistent between environments or programs that choose to change their digits limit but would avoid emitting hexadecimal unless we had no other choice. This variant could be thought of as:The text was updated successfully, but these errors were encountered: