gh-106004: Add PyDict_GetItemRef() function #106005

vstinner · 2023-06-23T01:01:27Z

Add PyDict_GetItemRef() and PyDict_GetItemStringRef() functions.
Add unit tests on the PyDict C API in test_capi.

Issue: C API: Add PyDict_GetItemRef() function #106004

📚 Documentation preview 📚: https://cpython-previews--106005.org.readthedocs.build/

Doc/c-api/dict.rst

Objects/dictobject.c

markshannon · 2023-06-23T09:40:02Z

What's the rationale for not distinguishing between found and not found in the return value?
See python/devguide#1121

markshannon · 2023-06-23T10:15:08Z

Can we come up with a better name than PyDict_GetItemRef?
I see why you are adding Ref to the end, but all API functions should return new references, so it is a bit like calling the function PyDict_GetItemNotWrong.

Obviously, the ideal name is already taken.
Anyone have any suggestions for a better name?

vstinner · 2023-06-23T10:40:40Z

What's the rationale for not distinguishing between found and not found in the return value?

Sure, if you think that it's useful, I can return 1 if the key is present.

It doesn't go against the syntax if (PyDict_GetItemRef(...) < 0) ... to check for error. I dislike having to use 2 variables to check for a function variable, so I prefer to use the implicit test in an if (...) statement. Pseudo-code with a variable to store the function result and dispatch the action depending on it:

PyObject *value;
int res = PyDict_GetItemRef(dict, key, &value);
if (res < 0) ... error ...
else if (res == 0) ... missing key ...
else ... present key

Well, I'm fine with people having different taste (if it doesn't go against my taste ;-)).

Can we come up with a better name than PyDict_GetItemRef?

Nope, that's my local optimum name :-)

Anyone have any suggestions for a better name?

Maybe PyObject_GetItem()? :-D Just kidding, PyObject_GetItem() already exists and works for any type (doesn't crash if it's not a dict) :-)

Well, that would just be 2 more functions on the top of the existing 9 functions to "get an item in dict":

PyDict_GetItem()
PyDict_GetItemString()
PyDict_GetItemWithError()
_PyDict_GetItemIdWithError()
_PyDict_GetItemStringWithError()
_PyDict_GetItemWithError()
_PyDict_GetItem_KnownHash()

Ok, let me propose a name: PyDict_GetItemOkThisTimeItShouldBeCorrect() :-)

I already added the following functions with the "Ref" suffix:

PyModule_AddObjectRef()
PyImport_AddModuleRef()
PyWeakref_GetRef()

The "Ref" is here to remain that this one it returns a strong reference.

And also: Py_NewRef() and Py_XNewRef() (not really a "strong reference" variant of an existing function).

markshannon · 2023-06-23T10:48:04Z

Testing the result is always at least as efficient as doing a test on *pvalue.
You need to handle all three cases, so there needs to be two tests regardless.

We made python/devguide#1121 to avoid having to repeat this discussion for every new API function.
If you think returning 0 for both found and not-found is better, can you explain why on that issue. Thanks.

markshannon · 2023-06-23T10:51:46Z

I already added the following functions with the "Ref" suffix:

PyModule_AddObjectRef()

PyImport_AddModuleRef()

PyWeakref_GetRef()

Can we please stop adding new functions to the C-API until we have established conventions.
Otherwise, we will need to add yet another variant that conforms to the naming convention.

vstinner · 2023-06-23T11:37:02Z

I modified the API to return 1 when the key is present (and 0 when the key is missing).

vstinner · 2023-06-23T13:15:12Z

Can we please stop adding new functions to the C-API until we have established conventions. Otherwise, we will need to add yet another variant that conforms to the naming convention.

I created capi-workgroup/problems#52 to discuss C API naming convention (when adding new functions).

vstinner · 2023-06-27T12:22:29Z

@erlend-aasland @markshannon: Are you ok with the proposed API? @erlend-aasland added some guidelines related to proposed API in the Devguide.

@erlend-aasland @markshannon: About the function name, I created capi-workgroup/problems#52 but it's unclear to me what's the outcome of this naming convention discussion.

What should I do to make progress on this PR?

markshannon · 2023-06-27T12:36:33Z

No, I'm not OK with the proposed API. I've already said that.

Using PyObject * is needlessly throwing away type information, and we haven't established naming conventions yet.
Let's agree on the rules first, before adding more functions that will need to be changed to the API.

What's the rush?

markshannon · 2023-06-27T12:38:49Z

TBH, I don't think you should have opened this PR until there was some feedback on the issue.

vstinner · 2023-06-27T14:53:03Z

By the way, draft PEP 703 – Making the Global Interpreter Lock Optional in CPython proposes PyDict_FetchItem() which returns a strong reference, to replace PyDict_GetItem() (which returns a borrowed reference). The PEP has a section explaining why PyDict_GetItem() is not deprecated: Why Not Deprecate PyDict_GetItem in Favor of PyDict_FetchItem?. I understand that the proposed API is: PyObject* PyDict_FetchItem(PyObject *dict, PyObject *key).

vstinner · 2023-06-27T14:56:23Z

Using PyObject * is needlessly throwing away type information,

I replied there: #106005 (comment)

we haven't established naming conventions yet.

I was confused by @gpshead's comment: "I don't think it matters which naming scheme we pick or even if it winds up wholly consistent."

What's the rush?

There is no rush. I just wanted to understand the status of this PR.

erlend-aasland

I'm approving this. A new naming scheme makes sense for a new API; I'm not sure it makes sense to try and enforce a new scheme in the current API. For now, there is already precedence of the Ref suffix in the current API; I'm ok with that. Also, the current API uses PyObject * all over the place. If we are to change this, we practically will end up with a completely new API; AFAICS, there is no problem with sticking to the current practice.

vstinner · 2023-07-12T18:33:10Z

commenting here to tell github to remove my approval as this PR seems to be going in exploratory directions and is awaiting various feedback and decisions.

This PR is ready for review, so I remove the draft status again.

vstinner · 2023-07-12T18:56:21Z

@serhiy-storchaka @gpshead @erlend-aasland: Are you ok with the API using PyDictObject* type? Or should I revert this 3rd commit?

Oh. @serhiy-storchaka and @gpshead seem to be strongly against it, so I reverted the change to go back to the initial plan: PyObject* type.

I tried to use a revert, but sadly the Git history became too complicated, so instead of squashed commits and I rebased my PR. Sorry about that :-(

Gregory:

commenting here to tell github to remove my approval as this PR seems to be going in exploratory directions and is awaiting various feedback and decisions.

Well, I was requested to set multiple guidelines for different aspects of the API on this single function addition.

Return 1 if the key is present?
- My first version returned 0 if the key is present and if the key is missing, both considered as "success".
- A guideline was added to the devguide: Document the preferred style for API functions with three, four or five-way returns devguide#1121
- I modified my PR to return 1 if the key is present (and return 0 if the key is missing).
Function name
- My PR adds Ref suffix to the existing function
- @markshannon dislikes the name and asked to define a naming convention for new functions.
- I created: "Naming convention" Naming convention for new C API functions capi-workgroup/problems#52
- @serhiy-storchaka proposed PyDict_GetOptionalItem() but is ok with PyDict_GetItemRef().
- PEP 703 proposes PyDict_FetchItem() name.
- The majority seems to be fine with the proposed name: PyDict_GetItemRef().
Type of the first parameter
- My first version used PyObject*
- @markshannon wants PyDictObject*, he created discussion: The C API is weakly typed capi-workgroup/problems#31
- I changed the parameter type to PyDictObject* to see the consequences of the change: replace runtime check with assertion, remove tests on invalid dict type (since it would crash rather than raising a clean SystemError exception).
- @serhiy-storchaka and @gpshead want PyObject*: they are against the need for cast between PyObject* and PyDictObject* and do prefer type checking at runtime, since it's too rare that developers have access to a Python binary with assertions enabled
- After @serhiy-storchaka and @gpshead pushback: I reverted this change, to come back to my initial plan: PyObject*.
@markshannon would prefer to define some guidelines first, before proposing adding a concrete function:
- "I'm opposed because making ad-hoc changes like this is going to make the C-API worse, not better."
Since @markshannon has some disagreements, @gpshead opened a discussion at the Steering Council decision: Should we add non-borrowed-ref public C APIs, if so, is there a naming convention? steering-council#201

PR changelog:

First version
Return 1 if the key is present
Change parameter name from pvalue to result (and simplify dictitems_contains() implementation)
Change first parameter type to PyDictObject**
Move the comments from C (dictobject.c) to header file (dictobject.h)
Revert the parameter type change: come back to PyObject*

serhiy-storchaka

I did not propose PyDict_GetOptionalItem() seriously, it would look confusing taking into account that other functions also return an "optional" item. I accept PyDict_GetItemRef() as token name and will accept any name whichever you choose. No need to bakeshed it to death.

I would be fine with the absent of runtime type check, not every C API function has a runtime type check, but the existing functions PyDict_GetItem() and PyDict_GetItemWithError() have a runtime type check, so why this function should be special? And compile time type check would just not work.

I confirm my approve.

erlend-aasland

Confirming my approval.

I believe introducing more type structs in the Limited API is a very bad decision, so I'm glad that particular change was reverted; I prefer to stick with the existing API convention of PyObject * only. For the "Brand New API", whatever that will be, a generic "PyObject-like struct" might not be the best option.

Regarding naming, I have no preference. Personally, I think I would have considered the PEP 703 naming to try and create slightly less merge conflict havoc for Sam ~~when~~ if nogil is going to be merged.

encukou · 2023-07-13T08:42:02Z

FWIW, here's a possible new variant: you could set result to NULL in which case the result isn't stored/incref'd. And that would start a convention of how to turn a get operation into a membership test. (And the Lookup name would fit that better.)

markshannon · 2023-07-13T08:55:50Z

If this function is to take PyObject *, as @erlend-aasland seems to insist, then it shouldn't raise a SystemError when passed something other than a dict. It should raise a TypeError.

serhiy-storchaka · 2023-07-13T11:43:50Z

PyDict_GetItem() and PyDict_GetItemWithError() raise a SystemError. All other functions in the Concrete Objects Layer either raise SystemError or have an undefined behavior. It is not specified what the behavior of the concrete function is, but passing an object of correct type is a requirement. https://docs.python.org/3/c-api/concrete.html

vstinner · 2023-07-13T12:03:18Z

I like SystemError. I used it in the past to detect bugs in C extensions, since TypeError is too common, so it was easy to distinguish "normal" TypeError from "invalid" SystemError. For me, SystemError means "bad API usage".

If this function is to take PyObject *, as @erlend-aasland seems to insist, then it shouldn't raise a SystemError when passed something other than a dict. It should raise a TypeError.

IMO if TypeError is preferred, we should try to keep the whole C API consistent and raise TypeError in all similar situations, not only for newly added functions. Otherwise, it can be misleading.

vstinner · 2023-07-13T12:06:47Z

FWIW, here's a possible new variant: you could set result to NULL in which case the result isn't stored/incref'd. And that would start a convention of how to turn a get operation into a membership test. (And the Lookup name would fit that better.)

For me, it's very surprising that the purpose of a function change "that much" depending on a parameter. For example, in Python, getattr() and hasattr() are two different functions: you cannot pass None to getattr() to check if an attribute exists.

I prefer to have separated functions to "get" an item/attribute and to check if an object "has" an item/attribute.

I dislike having to use locale.setlocale(loc, None) to get a locale :-( I don't want to use locale.getlocale(loc) since it returns different different.

This issue goes in all directions, it's a little bit hard to follow :-(

serhiy-storchaka · 2023-07-13T13:09:04Z

FWIW, here's a possible new variant: you could set result to NULL in which case the result isn't stored/incref'd.

How does this differ from PyDict_Contains()?

vstinner · 2023-07-13T13:37:37Z

Can we now please focus the discussion on this PR about PyDict_GetItemRef() and only PyDict_GetItemRef()?

For general C API discussions, I suggest using the https://github.com/capi-workgroup/problems/issues project.

So far, the PR got 2 approvals (@gpshead removed his approval).

Now I don't know if I'm supposed for wait for the SC: @gpshead created python/steering-council#201 issue but in 2 weeks, the SC didn't have time to look into this issue.

gpshead · 2023-07-13T20:12:58Z

my 2 cents on exception types: If it is a bug in code at the C API level, SystemError makes sense. If it is a bug in code at the Python level, TypeError makes sense. Somewhat of a coin toss on some C APIs when the input in question can have come directly from the Python level (turning the question into one of who is responsible for pre-checking the type). This is also the kind of thing we can change in the future when doing a cleanup sweep of API behaviors to change some SystemErrors coming from PyErr_BadInternalCall into TypeError as no code is ever expected to catch and specifically handle SystemError.

For now, lets let this PR sit while we wait for steering council answers.

vstinner · 2023-07-17T16:16:51Z

my 2 cents on exception types: If it is a bug in code at the C API level, SystemError makes sense. If it is a bug in code at the Python level, TypeError makes sense. Somewhat of a coin toss on some C APIs when the input in question can have come directly from the Python level (turning the question into one of who is responsible for pre-checking the type). This is also the kind of thing we can change in the future when doing a cleanup sweep of API behaviors to change some SystemErrors coming from PyErr_BadInternalCall into TypeError as no code is ever expected to catch and specifically handle SystemError.

PyObject_GetItem() is the safe generic API: it raises TypeError if the API is misused, if the first argument is not a mapping (or a sequence).

PyDict_GetItemWithError() is the fast specialized API: it raises SystemError if it's misused, if the first argument is not a dict or a dict subtype.

I'm taking a break from the C API discussions; I'm removing myself from this PR for now

* Add PyDict_GetItemRef() and PyDict_GetItemStringRef() functions. Add these functions to the stable ABI version 3.13. * Add unit tests on the PyDict C API in test_capi.

vstinner · 2023-07-19T10:42:16Z

I rebased my PR on the main branch to fix conflicts.

vstinner · 2023-07-21T21:19:37Z

Thanks everyone for the great reviews. I think that the final API is better than my first version.

The road for that function was more bumpy than for other functions since PyDict_GetDict() is one of the commonly used function, and as I wrote before, we took this function as an opportunity to revisit some API design choices. There are some disagreements which have been discussed in length, especially in the https://github.com/capi-workgroup/problems/issues/ project. Overall, the majority seems to be in favor of this change (and I didn't see any concrete counter-proposal to solve the issue).

Sadly, this PR lost two approvals in the bumpy discussion. IMO it's now time to move on and see how this function can be used to avoid bugs and how to migrate users from the cursed PyDict_GetItem() to that better PyDict_GetItemRef() function.

Supporters of a new API instead of fixing the current API one function by one, I suggest continuing the discussion in https://github.com/capi-workgroup/problems/issues/ : there are a few issues related to that. So far, I didn't see any clear nor complete proposal, so for me, we are still at the same status quo: we do our best, fixing functions, one by one, when we agree that there is an issue. Same for the question of using PyDictObject* type rather than PyObject* for the first parameter: it's still being discussed and so far, no consensus was reached.

Completely getting rid of PyDict_GetItem() may take time. Maybe we need to consider capi-workgroup/problems#54 approach for users who want to start a new C API with a clean C API without known issues like borrowed references. But well, that's out of the scope of this issue. This issue does not deprecate PyDict_GetItem() on purpose.

* main: pythongh-106004: Add PyDict_GetItemRef() function (python#106005)

vstinner · 2023-07-21T22:26:03Z

I added PyDict_GetItemRef() to pythoncapi-compat: python/pythoncapi-compat@eaff3c1

vstinner · 2023-07-24T18:30:38Z

If we keep these new functions in Python, IMO we should consider replacing usage of the private _PyDict_GetItemStringWithError() with the public PyDict_GetItemStringRef(). And maybe even remove _PyDict_GetItemStringWithError() later.

vstinner requested review from methane and markshannon as code owners June 23, 2023 01:01

bedevere-bot added the awaiting core review label Jun 23, 2023

bedevere-bot mentioned this pull request Jun 23, 2023

C API: Add PyDict_GetItemRef() function #106004

Closed

vstinner requested review from a team and encukou as code owners June 23, 2023 02:48

erlend-aasland reviewed Jun 23, 2023

View reviewed changes

Doc/c-api/dict.rst Outdated Show resolved Hide resolved

erlend-aasland reviewed Jun 23, 2023

View reviewed changes

Doc/c-api/dict.rst Outdated Show resolved Hide resolved

erlend-aasland reviewed Jun 23, 2023

View reviewed changes

Objects/dictobject.c Outdated Show resolved Hide resolved

vstinner force-pushed the dict_getitemref branch from f2856c8 to bc8655c Compare June 23, 2023 10:47

vstinner mentioned this pull request Jun 23, 2023

Naming convention for new C API functions capi-workgroup/problems#52

Closed

vstinner mentioned this pull request Jun 23, 2023

Document the preferred style for API functions with three, four or five-way returns python/devguide#1121

Closed

erlend-aasland mentioned this pull request Jun 26, 2023

C API: Add guidelines for C APIs with output params python/devguide#1125

Closed

vstinner mentioned this pull request Jun 26, 2023

Design a brand new C API with new PyCAPI_ prefix where all functions respect new guidelines capi-workgroup/api-revolution#9

Closed

erlend-aasland approved these changes Jun 29, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jun 29, 2023

gpshead approved these changes Jun 29, 2023

View reviewed changes

bedevere-bot added the awaiting core review label Jul 12, 2023

serhiy-storchaka approved these changes Jul 12, 2023

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Jul 12, 2023

erlend-aasland previously approved these changes Jul 12, 2023

View reviewed changes

vstinner mentioned this pull request Jul 17, 2023

Returning borrowed references is fundamentally unsafe capi-workgroup/problems#21

Open

pythongh-106004: Add PyDict_GetItemRef() function

01388dc

* Add PyDict_GetItemRef() and PyDict_GetItemStringRef() functions. Add these functions to the stable ABI version 3.13. * Add unit tests on the PyDict C API in test_capi.

vstinner force-pushed the dict_getitemref branch from a60ab95 to 01388dc Compare July 19, 2023 10:42

vstinner merged commit 41ca164 into python:main Jul 21, 2023

bedevere-bot removed the awaiting merge label Jul 21, 2023

vstinner deleted the dict_getitemref branch July 21, 2023 21:10

carljm added a commit to carljm/cpython that referenced this pull request Jul 21, 2023

Merge branch 'main' into superfix

b9fab74

* main: pythongh-106004: Add PyDict_GetItemRef() function (python#106005)

gvanrossum mentioned this pull request Oct 17, 2023

Define guideline for parameter types: PyObject vs specific type capi-workgroup/api-evolution#29

Open

skirpichev mentioned this pull request Mar 20, 2024

Add PyLong_Sign() public function capi-workgroup/decisions#19

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-106004: Add PyDict_GetItemRef() function #106005

gh-106004: Add PyDict_GetItemRef() function #106005

vstinner commented Jun 23, 2023 •

edited by github-actions bot

Loading

markshannon commented Jun 23, 2023

markshannon commented Jun 23, 2023

vstinner commented Jun 23, 2023

markshannon commented Jun 23, 2023

markshannon commented Jun 23, 2023

vstinner commented Jun 23, 2023

vstinner commented Jun 23, 2023

vstinner commented Jun 27, 2023

markshannon commented Jun 27, 2023

markshannon commented Jun 27, 2023

vstinner commented Jun 27, 2023

vstinner commented Jun 27, 2023

erlend-aasland left a comment

vstinner commented Jul 12, 2023

vstinner commented Jul 12, 2023

serhiy-storchaka left a comment

erlend-aasland left a comment

encukou commented Jul 13, 2023

markshannon commented Jul 13, 2023

serhiy-storchaka commented Jul 13, 2023

vstinner commented Jul 13, 2023

vstinner commented Jul 13, 2023

serhiy-storchaka commented Jul 13, 2023

vstinner commented Jul 13, 2023

gpshead commented Jul 13, 2023

vstinner commented Jul 17, 2023 •

edited

Loading

vstinner commented Jul 19, 2023

vstinner commented Jul 21, 2023

vstinner commented Jul 21, 2023

vstinner commented Jul 24, 2023

gh-106004: Add PyDict_GetItemRef() function #106005

gh-106004: Add PyDict_GetItemRef() function #106005

Conversation

vstinner commented Jun 23, 2023 • edited by github-actions bot Loading

markshannon commented Jun 23, 2023

markshannon commented Jun 23, 2023

vstinner commented Jun 23, 2023

markshannon commented Jun 23, 2023

markshannon commented Jun 23, 2023

vstinner commented Jun 23, 2023

vstinner commented Jun 23, 2023

vstinner commented Jun 27, 2023

markshannon commented Jun 27, 2023

markshannon commented Jun 27, 2023

vstinner commented Jun 27, 2023

vstinner commented Jun 27, 2023

erlend-aasland left a comment

Choose a reason for hiding this comment

vstinner commented Jul 12, 2023

vstinner commented Jul 12, 2023

serhiy-storchaka left a comment

Choose a reason for hiding this comment

erlend-aasland left a comment

Choose a reason for hiding this comment

encukou commented Jul 13, 2023

markshannon commented Jul 13, 2023

serhiy-storchaka commented Jul 13, 2023

vstinner commented Jul 13, 2023

vstinner commented Jul 13, 2023

serhiy-storchaka commented Jul 13, 2023

vstinner commented Jul 13, 2023

gpshead commented Jul 13, 2023

vstinner commented Jul 17, 2023 • edited Loading

vstinner commented Jul 19, 2023

vstinner commented Jul 21, 2023

vstinner commented Jul 21, 2023

vstinner commented Jul 24, 2023

vstinner commented Jun 23, 2023 •

edited by github-actions bot

Loading

vstinner commented Jul 17, 2023 •

edited

Loading