Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PyLong_Sign() public function #19

Closed
vstinner opened this issue Mar 11, 2024 · 44 comments
Closed

Add PyLong_Sign() public function #19

vstinner opened this issue Mar 11, 2024 · 44 comments

Comments

@vstinner
Copy link

API: int PyLong_Sign(PyObject *obj, int *sign)

Retrieve the sign of integer object obj (0, -1 or +1 for zero,
negative or positive integer, respectively) in a variable sign.

Return 0 on success, else -1 with an exception set. This function
always succeeds if obj is a :c:type:PyLongObject or its subtype.

PR: python/cpython#116561


I would like to propose adding the API directly to the limited API, what do you think?

@gvanrossum
Copy link

IIRC @markshannon has opinions on the API for long ints. Please ask him first.

@markshannon
Copy link

What's the use case for this?

Why pass in a PyObject *, rather than a PyLongObject *?
int PyLong_Sign(PyLongObject *obj) wouldn't need to worry about errors.

@skirpichev
Copy link

skirpichev commented Mar 12, 2024

What's the use case for this?

This is something we like to have to support int<->mpz conversion of "big enough" int's with mpz_import/export. See current code in the gmpy2 (similar approach has Sage): for reading and for writing.

Now we have *From/AsNativeBytes functions to do conversion of "big enough" int's, but this is much more slow than above approach.

Together with PyLong_Sign(), we need functions to access absolute value of "big enough" integer as an array of "digits" (for reading or writing). The GMP supports this kind of API with mpz_limbs_read() and mpz_limbs_write() functions. Perhaps, this is a more simple alternative (on CPython side) to PyInt_Import/Export functions.

Edit, a complete interface:

/* reading */
int PyLong_Sign(PyLongObject *obj);  // mpz_sgn()
Py_ssize_t PyLong_DigitCount(PyLongObject *obj);  // mpz_size()
const digit * PyLong_AsDigits(PyLongObject *obj);  // mpz_limbs_read()

/* writing (former _PyLong_FromDigits, used in _decimal.c) */
PyLongObject* PyLong_FromDigits(int negative, Py_ssize_t digit_count, digit *digits);

int PyLong_Sign(PyLongObject *obj) wouldn't need to worry about errors.

This is something I was thinking first. Clearly, it's enough for above use case. If this kind of API is acceptable (so far, I see only PyUnstable_* functions of this type) - I would like to adopt one.

@vstinner
Copy link
Author

@markshannon:

What's the use case for this?

A code search on PyPI top 5,000 projects (at 2023-11-15) finds usage in 10 projects:

  • Cython (3.0.5)
  • cffi (1.16.0)
  • datatable (1.0.0)
  • line_profiler (4.1.2)
  • mariadb (1.1.8)
  • msgspec (0.18.4)
  • orjson (3.9.10)
  • pickle5 (0.0.12)
  • pyodbc (5.0.1)
  • zodbpickle (3.1)

Code:

cffi-1.16.0.tar.gz: cffi-1.16.0/src/c/_cffi_backend.c: if (_PyLong_Sign(ob) < 0)
cffi-1.16.0.tar.gz: cffi-1.16.0/src/c/_cffi_backend.c: return _PyLong_Sign(ob) != 0;

Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_Sign(x)  (((PyLongObject*)x)->long_value.lv_tag & _PyLong_SIGN_MASK)
Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsNeg(x)  ((__Pyx_PyLong_Sign(x) & 2) != 0)
Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsZero(x)  (__Pyx_PyLong_Sign(x) & 1)
Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_IsPos(x)  (__Pyx_PyLong_Sign(x) == 0)
Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: ((1 - (Py_ssize_t) __Pyx_PyLong_Sign(x)) * __Pyx_PyLong_DigitCount(x))
Cython-3.0.5.tar.gz: Cython-3.0.5/Cython/Utility/TypeConversion.c: #define __Pyx_PyLong_CompactValue(x)  ((1 - (Py_ssize_t) __Pyx_PyLong_Sign(x)) * (Py_ssize_t) __Pyx_PyLong_Digits(x)[0])

datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/int.cc: int sign = _PyLong_Sign(v);
datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/int.cc: int sign = _PyLong_Sign(v);
datatable-1.0.0.tar.gz: datatable-1.0.0/src/core/python/obj.cc: int sign = _PyLong_Sign(v);

line_profiler-4.1.2.tar.gz: line_profiler-4.1.2/line_profiler/python25.pxd: int   _PyLong_Sign (object)  # No error.

mariadb-1.1.8.tar.gz: mariadb-1.1.8/mariadb/mariadb_codecs.c: if (_PyLong_Sign(value->value) < 0)

msgspec-0.18.4.tar.gz: msgspec-0.18.4/msgspec/_core.c: bool neg = _PyLong_Sign(obj) < 0;

pickle5-0.0.12.tar.gz: pickle5-0.0.12/pickle5/_pickle.c: int sign = _PyLong_Sign(obj);

pyodbc-5.0.1.tar.gz: pyodbc-5.0.1/src/connection.cpp: if (_PyLong_Sign(value) >= 0)
pyodbc-5.0.1.tar.gz: pyodbc-5.0.1/src/params.cpp: pNum->sign = _PyLong_Sign(cell) >= 0;

zodbpickle-3.1.tar.gz: zodbpickle-3.1/src/zodbpickle/_pickle_33.c: int sign = _PyLong_Sign(obj);

One usage is to ... test the sign of a number :-) Examples:

bool neg = _PyLong_Sign(obj) < 0;

and

if (_PyLong_Sign(value) >= 0) ...

@markshannon:

Why pass in a PyObject *, rather than a PyLongObject *?

For consistency with other PyLong APIs. Example: long PyLong_AsLong(PyObject *).

Generic PyObject* vs specific type (such as PyLongObject*) was also discussed in the Add PyDict_GetItemRef() function issue. It was decided to stick to PyObject*.

@skirpichev
Copy link

Generic PyObject* vs specific type (such as PyLongObject*) was also discussed in the python/cpython#106004 issue. It was decided to stick to PyObject*.

JFR, it was discussed rather in the pr thread: python/cpython#106005

Perhaps, this is the relevant issue: capi-workgroup/api-evolution#29

@vstinner
Copy link
Author

By the way, I'm surprised that the C API has no function to compare a Python int object to a C integer, something like:

int PyLong_CompareWithLong(PyObject*, long, int *result)

  • Returns 0 on success. In that case, *result is set to -1 if less than, 0 if equal, +1 if greater than.
  • Set an exception and return -1 on error.

I'm mentiong such hypothetical function since _PyLong_Sign(obj) is currently used by a few projects to compare a Python int to zero: is it equal to zero? Smaller than zero? Greater than zero?

It could be done with:

int cmp;
if (PyLong_CompareWithLong(number, 0, &cmp) < 0) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero
else ... greater than zero

Another data: I just added Py_GetConstantBorrowed(Py_CONSTANT_ZERO) and Py_GetConstantBorrowed(Py_CONSTANT_ONE) to the C API, and these objects can be used with PyObject_RichCompareBool().

int cmp;
PyObject *zero = Py_GetConstantBorrowed(Py_CONSTANT_ONE);
cmp = PyObject_RichCompareBool(number, zero);
if (cmp < 0 && PyErr_Occurred()) ... handle error ...
if (cmp == 0) ... equal to zero
else if (cmp < 0) ... smaller than zero ...
else ... greater than zero ...

But I dislike this code, since it uses a borrowed reference and it requires to call PyErr_Occurred() :-(

@vstinner
Copy link
Author

@encukou @gvanrossum @iritkatriel @zooba: What's your opinion on adding int PyLong_Sign(PyObject *obj, int *sign) API to Python 3.13?

@encukou
Copy link
Collaborator

encukou commented Apr 15, 2024

+1 from me.

There was no formal vote, but most members of the WG decided to use PyObject * rather than concrete types:

All our deliberately public APIs should be PyObject * to minimise abstraction leakage, and do their type checks (raising TypeErrors, not assertions). Fast APIs that skip checks should be the special case, not the default.

For “evolution”, let's stick to that.

@zooba
Copy link

zooba commented Apr 15, 2024

Hopefully PyObject_IsTrue is a relatively quick way to compare to zero (assuming you already know you have the right type).

I'm good with adding the function, and I'm about 50/50 between Victor's proposed API and one that returns a "switch"-able result:

switch(PyLong_Sign(o)) {
case PY_LONG_POSITIVE:
case PY_LONG_NEGATIVE:
case PY_LONG_ZERO:
    break;
default:
    // error raised
}

Though honestly, none of the times when I've needed this have I needed to optimise for a sign check before conversion, especially since for compact values it'll cost about the same to assign *sign as *value. But possibly there's a use for checking without converting at all.

@gvanrossum
Copy link

In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.

@iritkatriel
Copy link
Member

In general I find APIs with output variables awkward, so I want to use them only when necessary. Since the sign function has only four possible outcomes (error, negative, zero, positive) I feel it doesn't really warrant the output variable, so I'm supportive of Steve's solution.

I agree, with -1 for error and non-negative values for the three valid outcomes.

@vstinner
Copy link
Author

I'm fine with any API as soon as it doesn't require to call PyErr_Occurred().


If we go with the enum-like approach, I suggest to use more specific names, add "IS_" in the name, and use Py_ prefix rather than PY_:

#define Py_LONG_IS_ZERO 0
#define Py_LONG_IS_NEGATIVE 1
#define Py_LONG_IS_POSITIVE 2

Usage would be:

int sign = PyLong_Sign(obj);
if (sign < 0) { ... error ...; }
if (sign == Py_LONG_IS_ZERO || sign == Py_LONG_IS_POSITIVE) {
    ... obj >= 0 ...
}
if (sign == Py_LONG_IS_NEGATIVE) {
    ... obj < 0 ...
}

The advantage of int PyLong_Sign(PyObject *obj, int *sign) API is that no constant needs to be defined and sign can be used more naturally:

int sign;
if (PyLong_Sign(obj, &sign) < 0) { ... error ...; }
if (sign >= 0) {
    ... obj >= 0 ...
}
if (sign < 0) {
    ... obj < 0 ...
}

The gmpy project needs to check if a number is negative: https://github.com/aleaxit/gmpy/blob/eb8dfcbd84abcfcb36b4adcb0d5c6d050731dd75/src/gmpy2_convert_gmp.c#L41-L75

int negative = _PyLong_IsNegative(templong);
...
if (negative) {
    mpz_neg(result->z, result->z);
}

So any API would be fine.

@gvanrossum
Copy link

How about we define the result as follows?

  • -2 error
  • -1 negative
  • 0 zero
  • 1 positive

@iritkatriel
Copy link
Member

I think it will be confusing in light of the c api convention of <0 indicating error. That's probably stronger in this case than the convention on comparison result being -1,0,1.

@vstinner
Copy link
Author

PyObject_RichCompareBool() API doesn't have this issue since it takes two arguments and a 3rd is the comparison operator:

/* Perform a rich comparison with integer result.  This wraps
   PyObject_RichCompare(), returning -1 for error, 0 for false, 1 for true. */
int PyObject_RichCompareBool(PyObject *v, PyObject *w, int op)

@zooba
Copy link

zooba commented Apr 15, 2024

How about we define the result as follows?

I very nearly suggested this myself. I'm okay with it, apart from not really being convinced this function is special enough to justify the -2. A near equivalent could be:

  • -1 error
  • 0 negative
  • 1 zero
  • 2 positive

Such that PyLong_Sign(x) - 1 behaves the same. But that kinda feels like the sort of cleverness that we don't really do anymore, and any such use is one code review away from being "fixed" to use named constants.

@gvanrossum
Copy link

Okay, so how about

  • -1 error
  • 0 negative
  • 1 zero
  • 2 positive?

Then you can calculate the "classic" sign() function by first checking for error and then subtracting 1. If you've already ensured it's a PyLong you can skip the error check. After errors are ruled out, all the various tests can be expressed using comparisons to 1. The advantage of not using enums is that no switch is required.

@encukou
Copy link
Collaborator

encukou commented Apr 16, 2024

This might be too much of a detour, but:
Another option is to treat this as “compare to zero”, and start adding comparison API that uses the “slightly strange” enum based on powers of two that @markshannon introduced for the internal COMPARISON_* macros. That would be:

  • -1 error
  • (1 reserved for “unordered”, like NaN)
  • 2 negative
  • 4 positive
  • 8 zero

AFAICS, this can be shuffled a bit so that you can get a classic cmp() result by subtracting 2. (Not a classic sign() result though: it'd be -1/0/2.)

  • -1 error
  • 1 negative
  • 2 zero
  • 4 positive
  • (8 reserved for “unordered”, like NaN)

(Assuming the ergonomics are worth a few extra CPU instructions -- e.g. with #define Py_COMPARISON_BIT(x, y) 0xf & (0x2418 >> (8*((x) >= (y))) + 4*((x) <= (y))))

@markshannon
Copy link

I'd avoid the "slightly strange" return values. There is a good reason for them being the way they are, but it probably only makes sense to use them internally.
Plus, the overhead of a function call will dominate any bit shuffling, so we might as well keep the return values simple.

I still don't why we need an error value at all. All integers have a sign and this function will never need to allocate any memory.

I still think int PyLong_Sign(PyLongObject *obj) returning the "obvious" values of -1 (for negative), 0, or +1, is a lot easier to use than any of the other alternatives.

@markshannon
Copy link

There was no formal vote, but most members of the WG capi-workgroup/api-evolution#29 (comment) rather than concrete types

Using PyObject * consistently for the high level API makes sense, but adds unnecessary overhead for lower level API.
Getting the sign of an integer as a C int seems quite low level to me.

@zooba
Copy link

zooba commented Apr 16, 2024

I still don't why we need an error value at all.

Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int. We need to check the type and return something if the API is misused - reading arbitrary memory isn't okay.

Unfortunately, the intended "abstract API" vs "concrete API" distinction has long given up on assuming input types. The only "lower level API" we have is anything marked private, which is what we're changing here. We can keep the private API that assumes the type, but the public one has to check and handle it safely.

@markshannon
Copy link

Because we don't have a strongly typed public API, we have no way to know that the incoming value is an int.

int PyLong_Sign(PyLongObject *obj) is strongly typed and we do know that the incoming value is an PyLongObject, because it says so.

You could argue that someone could cast a PyUnicodeObject * to PyLongObject *, but casts are a fact of life in C.
Py_DECREF((PyObject *)7) is legal C, but we don't do elaborate checks to ensure that the argument of Py_DECREF is a valid pointer to a PyObject. We don't even check that it isn't NULL.

We have to assume a basic level of competence in the users of the C API, so why not here?

@zooba
Copy link

zooba commented Apr 16, 2024

By "we don't have" I meant in general, our public API is not strongly typed like that. We don't ever expect users of the API to handle types other than PyObject * or their own - PyLongObject is basically internal.

Yes we could add it, but we can't change what's already there, and so we decided that it's not going to be a useful evolution of the language. If we decide that a strongly typed C interface is valuable in a new API design (not a given, since we're being far more considerate of non-C users), then in a revamp we may well expose more C types as part of the API.

But for now, we're not going in that direction. So that's "why not here".

@gvanrossum
Copy link

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? I looked for this in the API docs and found a smattering, e.g. PyCode_GetNumFree, PyFrame_GetBuiltins.

It would be a novelty for the PyLong API though, and I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed. Plus, most likely, what the user code has in its hands is typed as PyObject *, because that's what you get when you construct a long (PyLong_FromLong etc.) or when you get something from another object, e.g. through PyObject_GetAttr or PyDict_GetItem. So we'd just be encouraging users to add a cast, which they'll do blindly. That, too, is a fact of life.

So in the end I agree with Steve.

@skirpichev
Copy link

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error?

If we count Unstable C API, then there are PyUnstable_Long_IsCompact() and PyUnstable_Long_CompactValue(), working with integer objects.

I'm not sure that it's worth deviating from the pattern by having this one function that's strongly typed.

I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount(), PyLong_AsDigits()).

@encukou
Copy link
Collaborator

encukou commented Apr 17, 2024

Yep. Alas, using PyLongObject* only for new functions only would make things too inconsistent.

Are there no parts of the C API where we specify a function that takes a specific object type as argument and returns a value without possibility of error? [...]

Incidentally, I'm (slowly) collecting this kind of data for my Language Summit topic, which will be about adding the kind of API Mark wants in a consistent way.
However, the info I have doesn't know about macros or functions only documented or implicitly understood to never fail, so my list is pretty small:

  • PyType_GetFlags, PyUnstable_Long_CompactValue

  • Plus there are ones that return void (PyType_Modified, PyFunction_SetVectorcall; perhaps Py_SET_SIZE counts) and ones that take a specific object in addition to PyObject* (Py_IS_TYPE, Py_SET_TYPE)

But we can also look at all functions that take a specific object. There aren't that many, and the vast majority of them deal with PyTypeObject, PyCodeObject, and PyFrameObject.

Click for a list

(From a Clang AST dump, ad-hoc filtering. Macros aren't included at all; no info about how wrongly cast arguments are handled.)

Py_IS_TYPE
   PyTypeObject * type
Py_SET_TYPE
   PyTypeObject * type
Py_SET_SIZE
   PyVarObject * ob
PyType_GetSlot
   PyTypeObject * (param 1)
PyType_GetModule
   PyTypeObject * (param 1)
PyType_GetModuleState
   PyTypeObject * (param 1)
PyType_GetName
   PyTypeObject * (param 1)
PyType_GetQualName
   PyTypeObject * (param 1)
PyType_GetFullyQualifiedName
   PyTypeObject * type
PyType_GetModuleName
   PyTypeObject * type
PyType_FromMetaclass
   PyTypeObject * (param 1)
PyObject_GetTypeData
   PyTypeObject * cls
PyType_GetTypeDataSize
   PyTypeObject * cls
PyType_IsSubtype
   PyTypeObject * (param 1)
   PyTypeObject * (param 2)
PyObject_TypeCheck
   PyTypeObject * type
PyType_GetFlags
   PyTypeObject * (param 1)
PyType_Ready
   PyTypeObject * (param 1)
PyType_GenericAlloc
   PyTypeObject * (param 1)
PyType_GenericNew
   PyTypeObject * (param 1)
PyType_Modified
   PyTypeObject * (param 1)
PyType_GetDict
   PyTypeObject * (param 1)
PyUnstable_Type_AssignVersionTag
   PyTypeObject * type
PyType_HasFeature
   PyTypeObject * type
PyType_GetModuleByDef
   PyTypeObject * (param 1)
PyObject_Init
   PyTypeObject * (param 2)
PyObject_InitVar
   PyVarObject * (param 1)
   PyTypeObject * (param 2)
PyType_SUPPORTS_WEAKREFS
   PyTypeObject * type
PyUnstable_Object_GC_NewWithExtraData
   PyTypeObject * (param 1)
PyUnstable_Long_IsCompact
   const PyLongObject * op
PyUnstable_Long_CompactValue
   const PyLongObject * op
PyCMethod_New
   PyTypeObject * (param 4)
PyFunction_SetVectorcall
   PyFunctionObject * (param 1)
PyCode_GetNumFree
   PyCodeObject * op
PyUnstable_Code_GetFirstFree
   PyCodeObject * op
PyCode_GetFirstFree
   PyCodeObject * op
PyCode_Addr2Line
   PyCodeObject * (param 1)
PyCode_Addr2Location
   PyCodeObject * (param 1)
PyCode_GetCode
   PyCodeObject * code
PyCode_GetVarnames
   PyCodeObject * code
PyCode_GetCellvars
   PyCodeObject * code
PyCode_GetFreevars
   PyCodeObject * code
PyFrame_GetLineNumber
   PyFrameObject * (param 1)
PyFrame_GetCode
   PyFrameObject * frame
PyFrame_GetBack
   PyFrameObject * frame
PyFrame_GetLocals
   PyFrameObject * frame
PyFrame_GetGlobals
   PyFrameObject * frame
PyFrame_GetBuiltins
   PyFrameObject * frame
PyFrame_GetGenerator
   PyFrameObject * frame
PyFrame_GetLasti
   PyFrameObject * frame
PyFrame_GetVar
   PyFrameObject * frame
PyFrame_GetVarString
   PyFrameObject * frame
PyTraceBack_Here
   PyFrameObject * (param 1)
PyGen_New
   PyFrameObject * (param 1)
PyGen_NewWithQualName
   PyFrameObject * (param 1)
PyGen_GetCode
   PyGenObject * gen
PyCoro_New
   PyFrameObject * (param 1)
PyAsyncGen_New
   PyFrameObject * (param 1)
PyDescr_NewMethod
   PyTypeObject * (param 1)
PyDescr_NewClassMethod
   PyTypeObject * (param 1)
PyDescr_NewMember
   PyTypeObject * (param 1)
PyDescr_NewGetSet
   PyTypeObject * (param 1)
PyDescr_NewWrapper
   PyTypeObject * (param 1)
PyStructSequence_InitType
   PyTypeObject * type
PyStructSequence_InitType2
   PyTypeObject * type
PyStructSequence_New
   PyTypeObject * type
PyModule_AddType
   PyTypeObject * type
PyEval_EvalFrame
   PyFrameObject * (param 1)
PyEval_EvalFrameEx
   PyFrameObject * f
PyUnstable_PerfTrampoline_CompileCode
   PyCodeObject * (param 1)
PyUnstable_Replace_Executor
   PyCodeObject * code
PyUnstable_GetExecutor
   PyCodeObject * code

Counts of tokens in the above argument types:

Counter({'*': 73, 'PyTypeObject': 38, 'PyFrameObject': 17, 'PyCodeObject': 12, 'PyVarObject': 2, 'const': 2, 'PyLongObject': 2, 'PyFunctionObject': 1, 'PyGenObject': 1})

These are functions starting with Py, where argument types include Py but not PyObject nor something from an ad-hoc list of non-PyObject types:

['PyCompilerFlags', 'PyConfig', 'PyGetSetDef', 'PyInterpreterConfig', 'PyInterpreterState', 'PyMemAllocatorEx', 'PyMemberDef', 'PyMethodDef', 'PyModuleDef', 'PyObject', 'PyObjectArenaAllocator', 'PyPreConfig', 'PyStructSequence_Desc', 'PyThreadState', 'PyTime_t', 'PyType_Spec', 'PyWideStringList', 'Py_UCS4', 'Py_buffer', 'Py_ssize_t', 'Py_tss_t', '_PyExecutorObject', '_PyInterpreterFrame', '_PyOptimizerObject', '_Py_CODEUNIT']

@zooba
Copy link

zooba commented Apr 17, 2024

I think we can reasonably hypothesise (and perhaps Guido can confirm) that the PyTypeObject * functions were initially likely to deal with static instantiations of PyTypeObject, either exported from CPython or defined in users' own code. So those would require a (valid) cast to PyObject * for cases where they were to be treated as first-class objects, but for most type object operations are easier to reference directly.

Anything allocated or instantiated by CPython would be object a.k.a. PyObject *, and so the "real" object type would never be referenced. Like having to cast malloc's void * result to the type you want, callers would have to cast PyObject * to PyLongObject *. But the principle here is EAFP, and so just as an instance of int can be passed to any Python function, an instance of PyLongObject can be passed to any C API function, which will then take responsibility for deciding whether the caller deserves forgiveness or not.

I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-object structs. So it could have gone either way - for me, I'd have said if they are opaque to make them PyObject * in calls, and if they are meant to be used as structs then to remove the Object from the name and define their own lifetime semantics.

I doubt this function has much sense alone. So, please consider other possible extensions of the public API, mentioned above (e.g. PyLong_DigitCount(), PyLong_AsDigits()).

If we do go this way, we should also define PyLong_DigitSize to return the number of bits in each digit. The point of using a standard interchange format (bytes) is to hide the internal implementation details - if we decide to expose those details, then they should be runtime parameterised (so compact values can return a DigitCount()=1 and DigitSize()=8*sizeof(internal repr) to avoid the recalculation work it would take to put it into "normal" digits).

@gvanrossum
Copy link

I think we can reasonably hypothesise (and perhaps Guido can confirm) that the PyTypeObject * functions were initially likely to deal with static instantiations of PyTypeObject, either exported from CPython or defined in users' own code. [...]

Yes, exactly.

I suspect the other functions taking more specific object types directly were either added without considering this principle or were being consistent with an earlier API that returned non-object structs. So it could have gone either way - for me, I'd have said if they are opaque to make them PyObject * in calls, and if they are meant to be used as structs then to remove the Object from the name and define their own lifetime semantics.

More likely, the Code and Frame were originally created primarily for internal use. The core of the interpreter uses these a lot and often accesses the members directly, so it's more convenient to have the API functions take the actual object types. We're not very consistent in this, alas, and the interpreter code is still full of casts, but since we're feeling the need for speed, repeated dynamic type checks are just wasted time on the fast path. Eventually, we have evolved a parallel set of internal APIs, so we care less about the performance of the public APIs, but it's too late to change those.

Maybe for 3.14 we can sit down with Mark and someone from (e.g.) GMP and design a brand new API for PyLong that is statically typed and provide all the types of access that libraries like GMP need while allowing the appropriate amount of API evolution/stability from the perspective of likely future improvements to the PyLong implementation. (Though I suspect that the step function there will be tagged pointers, not changes in object representation.)

@encukou
Copy link
Collaborator

encukou commented Apr 23, 2024

FWIW, I still prefer the original proposal to the 0/1/2.
I find consistency in the error indicator (-1) very important; I also don't think avoiding the output parameter is worth messing with the traditional sign value (-1/0/+1).

Looking at the guidelines proposal I've been writing, I'd generally like to go for a certain strict, even mechanical consistency, plus sometimes adding an extra variant that trades that consistency for convenience or speed.

So, why not both?

int PyLong_Sign(PyObject *obj, int *result)

  • on success, returns 0 and set *result to -1/0/+1
  • on error, returns -1 with exception set. Also, as a convenience, sets *result to -2 to make it switchable

int PyLong_SignUnchecked(PyLongObject *obj)

  • returns -1/0/+1
  • does no type checking (just a debug assert); cannot signal an error

@gvanrossum
Copy link

Why not both? Because this use case isn't important enough to have two separate APIs with different signature and behavior.

I can live with having just the two-argument version if we rename it to PyLong_GetSign.

@markshannon
Copy link

If we do provide both, PyLong_SignUnchecked is the one that will get used.
I'd be very surprised if users chose the more complex variant over the simpler one.

@zooba
Copy link

zooba commented Apr 23, 2024

If we were to do both, the latter ought to be PyUnstable_Long_Sign (or PyUnstableLong_Sign? Or PyLongUnstable_Sign?) and probably be defined as an inline function (like the existing private function).

Otherwise, 100% agree with Guido (and 90% with Mark: I think the pointer cast will sometimes be less convenient than chaining conditions (if (...Sign(x, &s) && s > 0 && ...NativeBytes(...)), but for the most part the simpler function will win).

@encukou
Copy link
Collaborator

encukou commented Apr 24, 2024

I can live with having just the two-argument version if we rename it to PyLong_GetSign.

Sounds great!

PyLong_SignUnchecked is the one that will get used.
I'd be very surprised if users chose the more complex variant over the simpler one.

That's OK -- with a name that announces that this function is unusual.
(And if we add it, I'd consider it inconsistent to not have the “regular” variant.)

PyUnstable_Long_Sign

I don't think we need to worry about API stability here. IMO, the convenient API is inconsistent with how we want new API to look. While consistency and stability are correlated, I'd rather keep them separate (Unchecked vs. Unstable).

@encukou
Copy link
Collaborator

encukou commented May 1, 2024

int PyLong_GetSign(PyObject *obj, int *sign)

On success, set *sign to sign of integer object obj (0, -1
or +1 for zero, negative or positive integer, respectively)
and return 0.

On failure, return -1 with an exception set.
This function always succeeds if obj is a
PyLongObject or its subtype.

@markshannon
Copy link

Have any of the likely users asked for this two argument form, as opposed to int PyLong_GetSign(PyLongObject *obj)?

@zooba
Copy link

zooba commented May 1, 2024

It hardly matters, they aren't getting one that accepts PyLongObject as an argument. The choice will be between putting the error code or the result in an output parameter, and previously we've agreed (in a very rare case of agreement 😄 ) that the error code should be the returned value.

@encukou
Copy link
Collaborator

encukou commented May 6, 2024

@iritkatriel, what are your thoughts?

@vstinner
Copy link
Author

@erlend-aasland: Are you ok with API proposed in #19 (comment) ? Tell me if you need more context/information. And it's ok if you need more time to decide.

@erlend-aasland
Copy link

I'm fine with the proposed API; I voted.

@vstinner
Copy link
Author

vstinner commented Jun 1, 2024

Thanks all. The SC adopted int PyLong_GetSign(PyObject *obj, int *sign) API: #19 (comment). I close the issue.

@serhiy-storchaka
Copy link

I concur with @markshannon. For such low-level function the overhead of checking the type is too high, and it makes the API less convenient.

Also, I think that PyLong_IsPositive(), PyLong_IsNegative() and PyLong_IsZero() are more useful in practice and adds less overhead. So I propose to implement these functions first, and then look if there are enough use cases are left for PyLong_Sign().

These functions are used when performance is very important. When it is not important, you can use PyObject_RichCompareBool().

@vstinner
Copy link
Author

vstinner commented Jun 3, 2024

These functions are used when performance is very important.

In which kind of code the performance of PyLong_GetSign() would matter? I don't expect this function to be called often, like not more than once per integer, and not in "hot code".

PyLong_GetSign() cannot fail if the argument is a Python int. If you know that the parameter is a Python int, you can simply ignore the error handling (maybe using an assertion, just in case).

@serhiy-storchaka
Copy link

I do not remember details, it was many years ago, but I used Py_SIZE(x) < 0 instead of _PyLong_Sign(x) < 0 more than once in the past because the difference was noticeable. You can search all occurrences of "_PyLong_Is" and try to replace them with _PyLong_Sign.

BTW, the _PyLong_Is*() functions are used much more (104 occurrences total: 26 _PyLong_IsZero, 7 _PyLong_IsPositive and 71 _PyLong_IsNegative) than _PyLong_Sign() (8). And several of these _PyLong_Sign() calls could be replaced with _PyLong_Is*().

The API proposed here is not only slower, it is less convenient. You need to introduce a variable, and even if you ignore error, you cannot use it in expression.

int sign;
(void)PyLong_Sign(obj, &sign);
if (sign < 0) {

instead of

if (PyLong_IsNegative(obj)) {

Multiply this by 112 use cases.

@vstinner
Copy link
Author

vstinner commented Jun 5, 2024

The API proposed here is not only slower, it is less convenient.

These aspects were taken in account in the decision.

This API takes PyObject*. If you consider that it's way too slow and the API is not convenient, you can propose adding a PyUnstable API which takes PyLongObject*. I'm not convinced that it's needed.

Also, this issue is not closed, so I suggest to open a new one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants