From 47f9a5fac5a230e321fcd48d9731206256ede82a Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Tue, 15 Nov 2022 21:44:30 +0000 Subject: [PATCH 1/6] Syntactic formalization of f-strings Co-authored-by: Batuhan Taskaya Co-authored-by: Lysandros Nikolaou --- pep-9999.txt | 367 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 367 insertions(+) create mode 100644 pep-9999.txt diff --git a/pep-9999.txt b/pep-9999.txt new file mode 100644 index 00000000000..440d3871d09 --- /dev/null +++ b/pep-9999.txt @@ -0,0 +1,367 @@ +PEP: 9999 +Title: Syntactic formalization of f-strings +Version: $Revision$ +Last-Modified: $Date$ +Author: Pablo Galindo , + Batuhan Taskaya , + Lysandros Nikolaou +Discussions-To: discuss.python.org +Type: Standards Track +Status: Draft +Content-Type: text/x-rst +Python-Version: 3.12 +Created: 15-Nov-2022 + + +Abstract +======== + +This document proposes to lift some of the restrictions originally formulated in +:pep:`498` and to provide a formalized grammar for f-strings that can be +integrated into the parser directly. The proposed syntactic formalization of +f-strings will have some small side-effects over how f-strings are parsed and +interpreted, allowing for a considerable number of advantages for end users and +library developers, while also improving dramatically the maintainance cost of +the code dedicated to parse f-strings. + + +Motivation +========== + +When f-strings were originally introduced in :pep:`498` the specification was +provided without providing a formal grammar for f-strings. Additionally, the +specification contains several restrictions that are imposed so the parsing of +f-strings could be implemented into CPython without modifying the existing +lexer. These limitations have been recognized previously and previous attempts +have been made to lift them in :pep:`536` but non of this work was ever implemented +(see [2]_). Some of these limitations (collected originally by :pep:`536`) are: + +#. It is impossible to use the quote character delimiting the f-string + within the expression portion:: + + >>> f'Magic wand: { bag['wand'] }' + ^ + SyntaxError: invalid syntax + +#. A previously considered way around it would lead to escape sequences + in executed code and is prohibited in f-strings:: + + >>> f'Magic wand { bag[\'wand\'] } string' + SyntaxError: f-string expression portion cannot include a backslash + +#. Comments are forbidden even in multi-line f-strings:: + + >>> f'''A complex trick: { + ... bag['bag'] # recursive bags! + ... }''' + SyntaxError: f-string expression part cannot include '#' + +#. Arbitrary nesting of expressions without expansion of escape sequences is + available in every single other language employing a string interpolation + method that uses expressions instead of just variable names. [6]_ + +These limitations serve no purpose from a language user perspective and +can be lifted by giving f-literals a regular grammar without exceptions +and implementing it using dedicated parse code. + +The other problems that f-strings have is that the current implementation in +CPython relies on tokenising f-strings as `STRING` tokens and a post processing of +these tokens. This has the following problems: + +#. It adds a considerable maintainance cost to the CPython parser. This is because + the parsing code needs to be written by hand, which has historically lead to a + considerable number of inconsistencies and bugs. Writing and maintaining parsing + code by hand in C has always been considered error prone and dangerous as it needs + to deal with a lot of manual memory management over the original lexer buffers. + +#. The f-string parsing code is not able to use the new improved error message mechanisms + that the new PEG parser, originally introduced in :pep:`617`, has allowed. The + improvements that these error messages brought has been greatly celebrated but + unfortunately f-strings cannot benefit from them because they are parsed in a + separate piece of the parsing machinery. This is especially unfortunate, since + there are several syntactical features of f-strings that can be confusing due + to the different implicit tokenization that happens inside the expression + part (for instance ``f"{y:=3}"`` is not an assignment expression). + +#. Other Python implementations have no way to know if they have implemented + f-strings correctly because contrary to other language features, they are not + part of the official Grammar [1]_. This is important because several prominent + alternative implementations such as PyPy are using CPython's PEG parser + (see [3]_) and/or are basing their Grammars on the official PEG Grammar. The + fact that f-strings use a separate parser prevents these alternative implementations + to leverage the official grammar and to benefit to improvements in error messages derived + from the grammar. + + +A version of this proposal was originally discussed in Python dev [4]_ and +presented at the Python language summit 2022 [5]_ where it was enthusiastically +received. + +Rationale +========= + +By building on top of the new Python PEG Parser (:pep:`617`) this PEP proposes +to redefine “f-strings” especially emphasizing on the clear separation of the +string component and the expression (or replacement, `{...}`) component. :pep:`498` +summarizes the syntactical part of “f-strings” as the following: + +> In Python source code, an f-string is a literal string, prefixed with ‘f’, which +> contains expressions inside braces. The expressions are replaced with their values. + +However unlike that definition, :pep:`498` also had formal list of exclusions on what +can or cannot be contained inside the expression component (primarily due to the +limitations of the existing parser). By clearly establishing the formal grammar, we +now also have the ability to define the expression component of an f-string as truly "any +applicable Python expression" (in that particular context) without being bound +by the limitations imposed by the details of our implementation. + +The formalization effort and the premise above also has a significant benefit from the +eye of Python programmers due to its ability to simplify and eliminate the obscure +limitations. Which reduces the "cognitive" burden and the "mental" complexity of +f-string literals (as well as the Python language in general). + +#. The expression component can include any string literal that a normal Python expression + can include. This opens up the possibility of nesting string literals (formatted or + not) inside expression component of an f-strings with the same quote type (and length):: + + >>> f"{"hello"}" + + >>> f"{source.removesuffix(".py")}.c: $(srcdir)/{source}" + + >>> f"{f"{f"infinite"}"}" + " " + f"{f"nesting!!!"}" + + This choice not only allows for a more consistent and predictable behavior of what can be + placed in f-strings but provides an intuitive way to manimulate string literals in a + more flexible way without to having to fight the limitations of the implementation. + +#. Another issue that has felt unintuitive to most is the lack of support for backslashes + within the expression component of an f-string. One example that keeps coming up is including + newline character in the expression part for joining containers. For example:: + + >>> a = ["hello", "world"] + >>> f"{'\n'.join(a)}" + File "", line 1 + f"{'\n'.join(a)}" + ^ + SyntaxError: f-string expression part cannot include a backslash + + A common work-around for this was to eiter assign the newline to an intermediate variable or + pre-create the whole string prior to creating the f-string:: + + >>> a = ["hello", "world"] + >>> joined = '\n'.join(a) + >>> f"{joined}" + 'hello\nworld' + + It only feels natural to allow backslashes in the expression part now that the new PEG parser + can easily support it. + + >>> a = ["hello", "world"] + >>> f"{'\n'.join(a)}" + 'hello\nworld' + +#. Before the changes proposed in this document, there was no explicit limit in + how f-strings can be nested, but the fact that string quotes cannot be reused + inside the expression component of f-strings made it impossible to nest + f-strings arbitrarily. In fact, this is the most nested-fstring that can be + written:: + + >>> f"""{f'''{f'{f"{1+1}"}'}'''}""" + '2' + + As this PEP allows to place **any** valid Python expression inside the + expression component of the f-strings, it is now possible to reuse quotes and + therefore is possible nest f-strings arbitrarily:: + + >>> f"{f"{f"{f"{f"{f"{1+1}"}"}"}"}"}" + '2' + + Although this is just a consequence of allowing arbitrary expressions, the + authors of this PEP do not believe that this is a fundamental benefit and we + have decided that the language specification will not explicitly mandate that + this nesting can be arbitrary. This is because allowing arbitrary deeply + nesting imposes a lot of extra complexity to the lexer implementation + (particularly as lexer/parser pipelines need to allow "untokenizing" to + support the 'fstrng debugging expressions' and this is specially taxing when + arbitrary nesting is allowed). Implementations are therefore free to impose a + limit on the nesting depth if they need to. Note that this is not an uncommon + situation, as the CPython implementation already imposes several limits all + over the place, including a limit on the nesting depth of parentheses and + brackets, a limit on the nesting of the blocks, a limit in the number of + branches in 'if' statements, a limit on the number of expressions in + star-unpacking, etc. + +Specification +============= + +The formal proposed PEG grammar specification for f-strings is (see :pep:`617` +for details on the syntax):: + + fstring + | FSTRING_START fstring_middle* FSTRING_END + fstring_middle + | fstring_replacement_field + | FSTRING_MIDDLE + fstring_replacement_field + | '{' (yield_expr | star_expressions) "="? [ "!" NAME ] [ ':' fstring_format_spec* ] '}' + fstring_format_spec: + | FSTRING_MIDDLE + | fstring_replacement_field + +This PEP leaves up to the implementation the level of f-string nesting allowed. +This means that limiting nesting is **not part of the language specification** +but also the language specification **doesn't mandate arbitrary nesting**. + +Three new tokens are introduced: + +* ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s). +* ``FSTRING_MIDDLE``: This token includes everything between two expression braces (``}`` and ``{``). +* ``FSTRING_END``: This token includes everything after the last expression brace (or the whole literal part + if no expression exists) until the closing quote. + +These tokens are always string parts and they are semantically equivalent to the +STRING token with the restrictions specified. These tokens must be produced by the lexer +when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. How +the lexer emits this token is **not specified** as this will heavily depend on every +implementation (even the Python version of the lexer in the standard library is +implemented differently to the one used by the PEG parser). + +As an example, ``f'some words {a+b} more words {c+d} final words'`` will be tokenized as:: + + FSTRING_START - "f'" + FSTRING_MIDDLE - 'some words ' + LBRACE - '{' + NAME - 'a' + PLUS - '+' + NAME - 'b' + RBRACE - '}' + FSTRING_MIDDLE - ' more words ' + LBRACE - '{' + NAME - 'c' + PLUS - '+' + NAME - 'd' + RBRACE - '}' + FSTRING_END - ' final words' (without the end quote) + +while ``f"""some words"""`` will be tokenized simply as:: + + FSTRING_START - 'f"""' + FSTRING_END - 'some words' + +All restrictions mentioned in the PEP are lifted from f-literals, as explained below: + +* Expression portions may now contain strings delimited with the same kind of + quote that is used to delimit the f-literal. +* Backslashes may now appear within expressions just like anywhere else in + Python code. In case of strings nested within f-literals, escape sequences are + expanded when the innermost string is evaluated. +* Comments, using the '#' character, are possible only in multi-line f-literals, + since comments are terminated by the end of the line (which makes closing a + single-line f-literal impossible) + +Backwards Compatibility +======================= + +This PEP is not backwards incompatible: any valid Python code will continue to +be valid if this PEP is implemented and it will not change semantically. + +How to Teach This +================= + +As the concept of f-strings is already ubiquitous in the Python community, there is +no fundamental need for users to learn anything new. However, as the formalized grammar +allows some new possibilities, it is important that the formal grammar is added to the +documentation and explained in detail, explicitly mentioning what constructs are possible +since this PEP is aiming to avoid confusion. + +It is also beneficial to provide users with a simple framework for understanding what can +be placed inside an f-string expression. In this case the authors think that this work will +make even more simple to explain this aspect of the language since it can be summarized as: + + "You can place any valid Python expression inside an f-string expression". + +With the changes in this PEP, there is no need to clarify that string quotes are +limited to be different as the quotes of the enclosing string because this is +now allowed: as an arbitrary Python string can contain any possible choice of +quotes, so can any f-string expression. Additionally there is no need to clarify +that certain things are not allowed in the expression part because of +implementation restructions such as comments, new line characters or +backslashes. + +The only "surprising" difference is that as the f-string allows to specificy a +format, expressions that allow a ``:`` character at top level still need to be +enclosed in parenthesis. This is not new to this work, but is important to +emphasize that this restriction is still in place. This allows for an easier +modification of the summary: + + You can place any valid Python expression inside + an f-string expression, and everything after a ``:`` character at top level will + be identified as a format specification + + +Reference Implementation +======================== + +A reference implementation can be found in the implementation_ fork. + +Rejected Ideas +============== + +#. We have decided not to lift the restriction that some expression portions + need to wrap ``':'`` and ``'!'`` in braces at top level, e.g.:: + + >>> f'Useless use of lambdas: { lambda x: x*2 }' + SyntaxError: unexpected EOF while parsing + + The reason is that this would this will introduce a considerable amount of + complexity for no real benefit. This is due to the fact that the ``:`` character + normally separates the f-string format specification. This format specification + is currently tokenized as a string. As the tokenizer MUST tokenize what's on the + right of the ``:`` as either a string or a stream of tokens this won't allow the + parser to differentiate between the different semantics as that would require the + tokenizer to backtrack and produce a different set of tokens (this is, first try + as a stream of tokens and if it fails try as a string for a format specifier). + + As there is no fundamental advantage in being able to allow lambdas and similar + expressions at top level, we have decided to keep the restriction that these must + be parenthesized if needed:: + + >>> f'Useless use of lambdas: { (lambda x: x*2) }' + + +Open Issues +=========== + +[Any points that are still being decided/discussed.] + + +Footnotes +========= + + +.. [1] "Grammar for f-strings": + https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals + +.. [2] "Deferred notice on PEP 536" + https://mail.python.org/archives/list/python-dev@python.org/thread/N43O4KNLZW4U7YZC4NVPCETZIVRDUVU2/#NM2A37THVIXXEYR4J5ZPTNLXGGUNFRLZ + +.. [3] "Pypy uses now CPython's PEG parser" + https://foss.heptapod.net/pypy/pypy/-/commit/fe120f89bf07e64a41de62b224e4a3d80e0fe0d4/pipelines?ref=branch%2Fpy3.9 + +.. [4] "Python-dev discussion about f-strings in the grammar" + https://mail.python.org/archives/list/python-dev@python.org/thread/54N3MOYVBDSJQZTU6MTCPLUPIFSDN5IS/#SAYU6SMP4KT7G7AQ6WVQYUDOSZPKHJMS + +.. [5] "Language summit 2022" + https://pyfound.blogspot.com/2022/05/the-2022-python-language-summit-f.html + +.. [6] Wikipedia article on string interpolation + https://en.wikipedia.org/wiki/String_interpolation + +.. _implementation: https://github.com/we-like-parsers/cpython/tree/fstring-grammar + + +Copyright +========= + +This document is placed in the public domain or under the +CC0-1.0-Universal license, whichever is more permissive. From 0f49045d4df7cfcd15525b4ba915567b54228b02 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Date: Wed, 30 Nov 2022 10:30:31 +0000 Subject: [PATCH 2/6] Update PEP number to 701 --- pep-9999.txt => pep-0701.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) rename pep-9999.txt => pep-0701.txt (99%) diff --git a/pep-9999.txt b/pep-0701.txt similarity index 99% rename from pep-9999.txt rename to pep-0701.txt index 440d3871d09..f463c1fb12b 100644 --- a/pep-9999.txt +++ b/pep-0701.txt @@ -1,4 +1,4 @@ -PEP: 9999 +PEP: 701 Title: Syntactic formalization of f-strings Version: $Revision$ Last-Modified: $Date$ From c081b4a006529f07ba740c94bb8e804ce1d69562 Mon Sep 17 00:00:00 2001 From: Lysandros Nikolaou Date: Wed, 30 Nov 2022 16:33:51 +0100 Subject: [PATCH 3/6] Apply suggestions from code review Co-authored-by: C.A.M. Gerlach --- pep-0701.txt | 131 +++++++++++++++++++++++++-------------------------- 1 file changed, 65 insertions(+), 66 deletions(-) diff --git a/pep-0701.txt b/pep-0701.txt index f463c1fb12b..632b5892759 100644 --- a/pep-0701.txt +++ b/pep-0701.txt @@ -1,16 +1,14 @@ PEP: 701 Title: Syntactic formalization of f-strings -Version: $Revision$ -Last-Modified: $Date$ Author: Pablo Galindo , Batuhan Taskaya , Lysandros Nikolaou -Discussions-To: discuss.python.org -Type: Standards Track +Discussions-To: Status: Draft +Type: Standards Track Content-Type: text/x-rst -Python-Version: 3.12 Created: 15-Nov-2022 +Python-Version: 3.12 Abstract @@ -19,22 +17,22 @@ Abstract This document proposes to lift some of the restrictions originally formulated in :pep:`498` and to provide a formalized grammar for f-strings that can be integrated into the parser directly. The proposed syntactic formalization of -f-strings will have some small side-effects over how f-strings are parsed and +f-strings will have some small side-effects on how f-strings are parsed and interpreted, allowing for a considerable number of advantages for end users and -library developers, while also improving dramatically the maintainance cost of -the code dedicated to parse f-strings. +library developers, while also dramatically reducing the maintenance cost of +the code dedicated to parsing f-strings. Motivation ========== -When f-strings were originally introduced in :pep:`498` the specification was +When f-strings were originally introduced in :pep:`498`, the specification was provided without providing a formal grammar for f-strings. Additionally, the specification contains several restrictions that are imposed so the parsing of f-strings could be implemented into CPython without modifying the existing lexer. These limitations have been recognized previously and previous attempts -have been made to lift them in :pep:`536` but non of this work was ever implemented -(see [2]_). Some of these limitations (collected originally by :pep:`536`) are: +have been made to lift them in :pep:`536`, but `none of this work was ever implemented`_. +Some of these limitations (collected originally by :pep:`536`) are: #. It is impossible to use the quote character delimiting the f-string within the expression portion:: @@ -58,17 +56,17 @@ have been made to lift them in :pep:`536` but non of this work was ever implemen #. Arbitrary nesting of expressions without expansion of escape sequences is available in every single other language employing a string interpolation - method that uses expressions instead of just variable names. [6]_ + method that uses expressions instead of just variable names, `per Wikipedia`_. These limitations serve no purpose from a language user perspective and can be lifted by giving f-literals a regular grammar without exceptions and implementing it using dedicated parse code. -The other problems that f-strings have is that the current implementation in -CPython relies on tokenising f-strings as `STRING` tokens and a post processing of +The other issue that f-strings have is that the current implementation in +CPython relies on tokenising f-strings as ``STRING`` tokens and a post processing of these tokens. This has the following problems: -#. It adds a considerable maintainance cost to the CPython parser. This is because +#. It adds a considerable maintenance cost to the CPython parser. This is because the parsing code needs to be written by hand, which has historically lead to a considerable number of inconsistencies and bugs. Writing and maintaining parsing code by hand in C has always been considered error prone and dangerous as it needs @@ -85,28 +83,29 @@ these tokens. This has the following problems: #. Other Python implementations have no way to know if they have implemented f-strings correctly because contrary to other language features, they are not - part of the official Grammar [1]_. This is important because several prominent - alternative implementations such as PyPy are using CPython's PEG parser - (see [3]_) and/or are basing their Grammars on the official PEG Grammar. The + part of the :ref:`official Python grammar `. + This is important because several prominent + alternative implementations are using CPython's PEG parser, `such as PyPy`_, + and/or are basing their grammars on the official PEG Grammar. The fact that f-strings use a separate parser prevents these alternative implementations - to leverage the official grammar and to benefit to improvements in error messages derived + from leveraging the official grammar and benefiting from improvements in error messages derived from the grammar. -A version of this proposal was originally discussed in Python dev [4]_ and -presented at the Python language summit 2022 [5]_ where it was enthusiastically +A version of this proposal was originally `discussed on Python-Dev`_ and +`presented at the Python Language Summit 2022`_ where it was enthusiastically received. Rationale ========= -By building on top of the new Python PEG Parser (:pep:`617`) this PEP proposes -to redefine “f-strings” especially emphasizing on the clear separation of the -string component and the expression (or replacement, `{...}`) component. :pep:`498` +By building on top of the new Python PEG Parser (:pep:`617`), this PEP proposes +to redefine “f-strings”, especially emphasizing the clear separation of the +string component and the expression (or replacement, ``{...}``) component. :pep:`498` summarizes the syntactical part of “f-strings” as the following: -> In Python source code, an f-string is a literal string, prefixed with ‘f’, which -> contains expressions inside braces. The expressions are replaced with their values. + In Python source code, an f-string is a literal string, prefixed with ‘f’, which + contains expressions inside braces. The expressions are replaced with their values. However unlike that definition, :pep:`498` also had formal list of exclusions on what can or cannot be contained inside the expression component (primarily due to the @@ -115,14 +114,14 @@ now also have the ability to define the expression component of an f-string as t applicable Python expression" (in that particular context) without being bound by the limitations imposed by the details of our implementation. -The formalization effort and the premise above also has a significant benefit from the -eye of Python programmers due to its ability to simplify and eliminate the obscure -limitations. Which reduces the "cognitive" burden and the "mental" complexity of +The formalization effort and the premise above also has a significant benefit for +Python programmers due to its ability to simplify and eliminate the obscure +limitations. This reduces the mental burden and the cognitive complexity of f-string literals (as well as the Python language in general). #. The expression component can include any string literal that a normal Python expression can include. This opens up the possibility of nesting string literals (formatted or - not) inside expression component of an f-strings with the same quote type (and length):: + not) inside the expression component of an f-string with the same quote type (and length):: >>> f"{"hello"}" @@ -136,7 +135,7 @@ f-string literals (as well as the Python language in general). #. Another issue that has felt unintuitive to most is the lack of support for backslashes within the expression component of an f-string. One example that keeps coming up is including - newline character in the expression part for joining containers. For example:: + a newline character in the expression part for joining containers. For example:: >>> a = ["hello", "world"] >>> f"{'\n'.join(a)}" @@ -145,7 +144,7 @@ f-string literals (as well as the Python language in general). ^ SyntaxError: f-string expression part cannot include a backslash - A common work-around for this was to eiter assign the newline to an intermediate variable or + A common work-around for this was to either assign the newline to an intermediate variable or pre-create the whole string prior to creating the f-string:: >>> a = ["hello", "world"] @@ -169,9 +168,9 @@ f-string literals (as well as the Python language in general). >>> f"""{f'''{f'{f"{1+1}"}'}'''}""" '2' - As this PEP allows to place **any** valid Python expression inside the + As this PEP allows placing **any** valid Python expression inside the expression component of the f-strings, it is now possible to reuse quotes and - therefore is possible nest f-strings arbitrarily:: + therefore is possible to nest f-strings arbitrarily:: >>> f"{f"{f"{f"{f"{f"{1+1}"}"}"}"}"}" '2' @@ -179,23 +178,25 @@ f-string literals (as well as the Python language in general). Although this is just a consequence of allowing arbitrary expressions, the authors of this PEP do not believe that this is a fundamental benefit and we have decided that the language specification will not explicitly mandate that - this nesting can be arbitrary. This is because allowing arbitrary deeply + this nesting can be arbitrary. This is because allowing arbitrarily-deep nesting imposes a lot of extra complexity to the lexer implementation (particularly as lexer/parser pipelines need to allow "untokenizing" to - support the 'fstrng debugging expressions' and this is specially taxing when + support the 'f-string debugging expressions' and this is especially taxing when arbitrary nesting is allowed). Implementations are therefore free to impose a limit on the nesting depth if they need to. Note that this is not an uncommon situation, as the CPython implementation already imposes several limits all over the place, including a limit on the nesting depth of parentheses and brackets, a limit on the nesting of the blocks, a limit in the number of - branches in 'if' statements, a limit on the number of expressions in + branches in ``if`` statements, a limit on the number of expressions in star-unpacking, etc. Specification ============= The formal proposed PEG grammar specification for f-strings is (see :pep:`617` -for details on the syntax):: +for details on the syntax): + +.. code-block:: peg fstring | FSTRING_START fstring_middle* FSTRING_END @@ -220,13 +221,17 @@ Three new tokens are introduced: if no expression exists) until the closing quote. These tokens are always string parts and they are semantically equivalent to the -STRING token with the restrictions specified. These tokens must be produced by the lexer +``STRING`` token with the restrictions specified. These tokens must be produced by the lexer when lexing f-strings. This means that **the tokenizer cannot produce a single token for f-strings anymore**. How the lexer emits this token is **not specified** as this will heavily depend on every implementation (even the Python version of the lexer in the standard library is implemented differently to the one used by the PEG parser). -As an example, ``f'some words {a+b} more words {c+d} final words'`` will be tokenized as:: +As an example:: + + f'some words {a+b} more words {c+d} final words' + +will be tokenized as:: FSTRING_START - "f'" FSTRING_MIDDLE - 'some words ' @@ -255,7 +260,7 @@ All restrictions mentioned in the PEP are lifted from f-literals, as explained b * Backslashes may now appear within expressions just like anywhere else in Python code. In case of strings nested within f-literals, escape sequences are expanded when the innermost string is evaluated. -* Comments, using the '#' character, are possible only in multi-line f-literals, +* Comments, using the ``#`` character, are possible only in multi-line f-literals, since comments are terminated by the end of the line (which makes closing a single-line f-literal impossible) @@ -276,27 +281,27 @@ since this PEP is aiming to avoid confusion. It is also beneficial to provide users with a simple framework for understanding what can be placed inside an f-string expression. In this case the authors think that this work will -make even more simple to explain this aspect of the language since it can be summarized as: +make it even simpler to explain this aspect of the language, since it can be summarized as: - "You can place any valid Python expression inside an f-string expression". + You can place any valid Python expression inside an f-string expression. With the changes in this PEP, there is no need to clarify that string quotes are -limited to be different as the quotes of the enclosing string because this is +limited to be different from the quotes of the enclosing string, because this is now allowed: as an arbitrary Python string can contain any possible choice of quotes, so can any f-string expression. Additionally there is no need to clarify that certain things are not allowed in the expression part because of implementation restructions such as comments, new line characters or backslashes. -The only "surprising" difference is that as the f-string allows to specificy a -format, expressions that allow a ``:`` character at top level still need to be -enclosed in parenthesis. This is not new to this work, but is important to +The only "surprising" difference is that as f-strings allow specifying a +format, expressions that allow a ``:`` character at the top level still need to be +enclosed in parenthesis. This is not new to this work, but it is important to emphasize that this restriction is still in place. This allows for an easier modification of the summary: You can place any valid Python expression inside - an f-string expression, and everything after a ``:`` character at top level will - be identified as a format specification + an f-string expression, and everything after a ``:`` character at the top level will + be identified as a format specification. Reference Implementation @@ -308,7 +313,7 @@ Rejected Ideas ============== #. We have decided not to lift the restriction that some expression portions - need to wrap ``':'`` and ``'!'`` in braces at top level, e.g.:: + need to wrap ``':'`` and ``'!'`` in braces at the top level, e.g.:: >>> f'Useless use of lambdas: { lambda x: x*2 }' SyntaxError: unexpected EOF while parsing @@ -317,13 +322,13 @@ Rejected Ideas complexity for no real benefit. This is due to the fact that the ``:`` character normally separates the f-string format specification. This format specification is currently tokenized as a string. As the tokenizer MUST tokenize what's on the - right of the ``:`` as either a string or a stream of tokens this won't allow the + right of the ``:`` as either a string or a stream of tokens, this won't allow the parser to differentiate between the different semantics as that would require the tokenizer to backtrack and produce a different set of tokens (this is, first try - as a stream of tokens and if it fails try as a string for a format specifier). + as a stream of tokens, and if it fails, try as a string for a format specifier). As there is no fundamental advantage in being able to allow lambdas and similar - expressions at top level, we have decided to keep the restriction that these must + expressions at the top level, we have decided to keep the restriction that these must be parenthesized if needed:: >>> f'Useless use of lambdas: { (lambda x: x*2) }' @@ -332,30 +337,24 @@ Rejected Ideas Open Issues =========== -[Any points that are still being decided/discussed.] +None yet Footnotes ========= -.. [1] "Grammar for f-strings": - https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals +.. _official Python grammar: https://docs.python.org/3/reference/lexical_analysis.html#formatted-string-literals -.. [2] "Deferred notice on PEP 536" - https://mail.python.org/archives/list/python-dev@python.org/thread/N43O4KNLZW4U7YZC4NVPCETZIVRDUVU2/#NM2A37THVIXXEYR4J5ZPTNLXGGUNFRLZ +.. _none of this work was ever implemented: https://mail.python.org/archives/list/python-dev@python.org/thread/N43O4KNLZW4U7YZC4NVPCETZIVRDUVU2/#NM2A37THVIXXEYR4J5ZPTNLXGGUNFRLZ -.. [3] "Pypy uses now CPython's PEG parser" - https://foss.heptapod.net/pypy/pypy/-/commit/fe120f89bf07e64a41de62b224e4a3d80e0fe0d4/pipelines?ref=branch%2Fpy3.9 +.. _such as PyPy: https://foss.heptapod.net/pypy/pypy/-/commit/fe120f89bf07e64a41de62b224e4a3d80e0fe0d4/pipelines?ref=branch%2Fpy3.9 -.. [4] "Python-dev discussion about f-strings in the grammar" - https://mail.python.org/archives/list/python-dev@python.org/thread/54N3MOYVBDSJQZTU6MTCPLUPIFSDN5IS/#SAYU6SMP4KT7G7AQ6WVQYUDOSZPKHJMS +.. _discussed on Python-Dev: https://mail.python.org/archives/list/python-dev@python.org/thread/54N3MOYVBDSJQZTU6MTCPLUPIFSDN5IS/#SAYU6SMP4KT7G7AQ6WVQYUDOSZPKHJMS -.. [5] "Language summit 2022" - https://pyfound.blogspot.com/2022/05/the-2022-python-language-summit-f.html +.. _presented at the Python Language Summit 2022: https://pyfound.blogspot.com/2022/05/the-2022-python-language-summit-f.html -.. [6] Wikipedia article on string interpolation - https://en.wikipedia.org/wiki/String_interpolation +.. _per Wikipedia: https://en.wikipedia.org/wiki/String_interpolation#Examples .. _implementation: https://github.com/we-like-parsers/cpython/tree/fstring-grammar From 485b3a2a3c61bbae59f0f839e7f6d69df43beb6a Mon Sep 17 00:00:00 2001 From: Lysandros Nikolaou Date: Wed, 30 Nov 2022 16:38:56 +0100 Subject: [PATCH 4/6] Rename file and add CODEOWNERS --- .github/CODEOWNERS | 1 + pep-0701.txt => pep-0701.rst | 0 2 files changed, 1 insertion(+) rename pep-0701.txt => pep-0701.rst (100%) diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index cb6318ed3d1..1269bad5d70 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -581,6 +581,7 @@ pep-0697.rst @encukou pep-0698.rst @jellezijlstra pep-0699.rst @Fidget-Spinner pep-0700.rst @pfmoore +pep-0701.rst @pablogsal @isidentical @lysnikolaou # ... # pep-0754.txt # ... diff --git a/pep-0701.txt b/pep-0701.rst similarity index 100% rename from pep-0701.txt rename to pep-0701.rst From c65cd986a403e89c6be6651d87a22cb44d7f2655 Mon Sep 17 00:00:00 2001 From: Pablo Galindo Salgado Date: Thu, 1 Dec 2022 13:16:16 +0000 Subject: [PATCH 5/6] Apply suggestions from code review Co-authored-by: Jelle Zijlstra Co-authored-by: C.A.M. Gerlach Co-authored-by: Jim Fasarakis-Hilliard --- pep-0701.rst | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/pep-0701.rst b/pep-0701.rst index 632b5892759..a02c062bee1 100644 --- a/pep-0701.rst +++ b/pep-0701.rst @@ -67,7 +67,7 @@ CPython relies on tokenising f-strings as ``STRING`` tokens and a post processin these tokens. This has the following problems: #. It adds a considerable maintenance cost to the CPython parser. This is because - the parsing code needs to be written by hand, which has historically lead to a + the parsing code needs to be written by hand, which has historically led to a considerable number of inconsistencies and bugs. Writing and maintaining parsing code by hand in C has always been considered error prone and dangerous as it needs to deal with a lot of manual memory management over the original lexer buffers. @@ -86,7 +86,7 @@ these tokens. This has the following problems: part of the :ref:`official Python grammar `. This is important because several prominent alternative implementations are using CPython's PEG parser, `such as PyPy`_, - and/or are basing their grammars on the official PEG Grammar. The + and/or are basing their grammars on the official PEG grammar. The fact that f-strings use a separate parser prevents these alternative implementations from leveraging the official grammar and benefiting from improvements in error messages derived from the grammar. @@ -107,7 +107,7 @@ summarizes the syntactical part of “f-strings” as the following: In Python source code, an f-string is a literal string, prefixed with ‘f’, which contains expressions inside braces. The expressions are replaced with their values. -However unlike that definition, :pep:`498` also had formal list of exclusions on what +However, :pep:`498` also contained a formal list of exclusions on what can or cannot be contained inside the expression component (primarily due to the limitations of the existing parser). By clearly establishing the formal grammar, we now also have the ability to define the expression component of an f-string as truly "any @@ -123,7 +123,7 @@ f-string literals (as well as the Python language in general). can include. This opens up the possibility of nesting string literals (formatted or not) inside the expression component of an f-string with the same quote type (and length):: - >>> f"{"hello"}" + >>> f"Tthese are the things: {", ".join(things)}" >>> f"{source.removesuffix(".py")}.c: $(srcdir)/{source}" @@ -216,7 +216,8 @@ but also the language specification **doesn't mandate arbitrary nesting**. Three new tokens are introduced: * ``FSTRING_START``: This token includes f-string character (``f``/``F``) and the open quote(s). -* ``FSTRING_MIDDLE``: This token includes everything between two expression braces (``}`` and ``{``). +* ``FSTRING_MIDDLE``: This token includes the text between the opening quote + and the first expression brace (``{``) and the text between two expression braces (``}`` and ``{``). * ``FSTRING_END``: This token includes everything after the last expression brace (or the whole literal part if no expression exists) until the closing quote. @@ -267,7 +268,7 @@ All restrictions mentioned in the PEP are lifted from f-literals, as explained b Backwards Compatibility ======================= -This PEP is not backwards incompatible: any valid Python code will continue to +This PEP is backwards compatible: any valid Python code will continue to be valid if this PEP is implemented and it will not change semantically. How to Teach This From 531ac8e46424434eb2ac1094fb6df24c620ba3b5 Mon Sep 17 00:00:00 2001 From: Lysandros Nikolaou Date: Thu, 1 Dec 2022 14:17:57 +0100 Subject: [PATCH 6/6] Fix typo from previous suggestion --- pep-0701.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pep-0701.rst b/pep-0701.rst index a02c062bee1..db1925d1e13 100644 --- a/pep-0701.rst +++ b/pep-0701.rst @@ -123,7 +123,7 @@ f-string literals (as well as the Python language in general). can include. This opens up the possibility of nesting string literals (formatted or not) inside the expression component of an f-string with the same quote type (and length):: - >>> f"Tthese are the things: {", ".join(things)}" + >>> f"These are the things: {", ".join(things)}" >>> f"{source.removesuffix(".py")}.c: $(srcdir)/{source}"