Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-identifier keys in ** arguments and elsewhere #142

Closed
jeff5 opened this issue Sep 18, 2022 · 3 comments
Closed

Non-identifier keys in ** arguments and elsewhere #142

jeff5 opened this issue Sep 18, 2022 · 3 comments

Comments

@jeff5
Copy link

jeff5 commented Sep 18, 2022

It has been noted in [1] that in several circumstances, arbitrary string keys are accepted into a name space ostensibly intended for identifiers. Currently, CPython checks at most that the key is a str, and admits keys that would not be valid identifiers.

In these circumstances, should we

  1. document that this is a Python language feature, or
  2. note that it is a CPython implementation detail that may or may not be supported on other/later versions?

Where this comes up

The discussion [1] and related issue [2] identify these cases:

  1. When additional keyword arguments are supplied in a call using the syntax the syntax **expression, CPython checks that the keys of the mapping are str (or a subclass) but not that they are identifiers. If the function has a formal parameter **identifier, it gathers these non-identifier KV-pairs, and the function body may treat as a dictionary.

  2. When an object allows the addition of attributes, the default implementation of __setattr__ in CPython checks the name is a str (or a subclass) but not that it is an identifier (see _PyObject_GenericSetAttrWithDict). The built-in setattr() makes the same check (or rather PyObject_SetAttr does, which it calls). By either route, we obtain an instance with an attribute that is not accessible using dot notation (only by getattr() etc.).

  3. In a variant of 2., it is possible to give a type an attribute (value or descriptor) whose name is not an identifier, by manipulation of the locals() during definition, that is accessible to getattr(). This may even be a non-string, but a non-string key is not accessible other than directly on the __dict__ of the type.

The third is in contrast to the behaviour of __slots__, which insists on identifiers and projects names that are subclasses of str ont the base class str (using _Py_Mangle from compile.c).

It is possible for the SC to accept any or all cases as a language feature, but as the arguments overlap, all or none seems most defensible. Two documentation-only PRs await this SC decision: one addressing keyword arguments [3] and the other object attributes [4].

Issue [2] also identifies the question whether subclasses of str really ought to be allowed, or maybe projected as in __slots__. This appears orthogonal to the question asked here (but interesting).

Arguments for accepting non-identifier strings as Python

It is an established practice. There is an example of non-identifier keyword arguments at [5] and of an __init__ in the Azure SDK that looks for such keywords at [6].

It is "consistent, harmless and intentional" and "a language feature in good standing" [GvR].

"leaving it up to the implementation ... just invites gratuitous differences." [GvR]

"Sometimes Python object attributes are mapped to attributes from some other system that may have different naming rules."

"Calling functions in Python is slow enough as it is without the extra checks". (But is it unimaginable in any implementation with motive and ingenuety?)

The other major interpreters follow CPython behaviour.

Arguments these are implementation details (with option to disallow)

Consistency: the **mapping is intended to supply keyword arguments, and these names must be identifiers. (Note in passing proposal [7], yet to attract support, for syntax that would allow any string where an identifier is expected.)

Similarly, the glossary defines an attribute as "a value referenced by a dotted expression ... o.a". (The PR at [4] includes additional words.)

Arguments against documenting it at all

"Documenting it would require all Python implementations to support it, including future versions of CPython. [Even if documented as an "implementation detail"?] ... I’m perfectly happy with it being an implementation detail of CPython that people have to discover for themselves." [1]

References

[1] Python ideas https://discuss.python.org/t/supporting-or-not-invalid-identifiers-in-kwargs/17147 Thanks to contributors there for the arguments summarised here.
[2] python/cpython#96397
[3] python/cpython#96393
[4] python/cpython#96454
[5] PyTorch-UNet https://github.com/milesial/Pytorch-UNet/blob/a96fbb05ccdbb8a140471391c8e51d159ffaa45e/train.py#L111. Arguably this stems from an API fault in tqdm.set_postfix, where the programmer's intent is to accept a mapping.
[6] Azure SDK https://github.com/Azure/azure-sdk-for-python/blob/608d038c352878c0931df3a3b5319372da8847fb/sdk/storage/azure-storage-blob/azure/storage/blob/_models.py#L385-L390
[7] https://discuss.python.org/t/backtics-to-allow-any-name/18698

@encukou
Copy link
Member

encukou commented Sep 19, 2022

Thanks! I added this to the agenda.

@encukou
Copy link
Member

encukou commented Sep 20, 2022

The SC agrees that allowing arbitrary strings here is a feature of Python, rather than an implementation detail.

— Petr, on behalf of the SC

@encukou
Copy link
Member

encukou commented Sep 20, 2022

I started a thread on how the details might work out: https://discuss.python.org/t/19293

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants