Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inline literals that span multiple lines in the RST file get rendered incorrectly #2511

Closed
ExplodingCabbage opened this issue Apr 10, 2022 · 1 comment · Fixed by #2512
Closed
Labels

Comments

@ExplodingCabbage
Copy link
Contributor

ExplodingCabbage commented Apr 10, 2022

PEP 8 contains a few examples of this problem. For instance, consider this paragraph:

  A bare ``except:`` clause will catch SystemExit and
  KeyboardInterrupt exceptions, making it harder to interrupt a
  program with Control-C, and can disguise other problems.  If you
  want to catch all exceptions that signal program errors, use
  ``except Exception:`` (bare except is equivalent to ``except
  BaseException:``).

The final inline literal here, namely

``except
  BaseException:``

spans two lines. When this gets rendered (at https://peps.python.org/pep-0008/, or by following the build steps in this repo), the line break gets included literally in the rendered document:

screenshot demonstrating the above

Clearly, this is a bit ugly and confusing, and not how we want the final document to appear. But why's it happening?

Q: When did this break?

A: The breakage was introduced when the new Sphinx-based build pipeline was added. I checked out 977f1fe, where docs on how to run the new build process were first added, and built the PEPs, and the same bug is present. However, the live website (which was not yet using the new build pipeline) did not show the bug at that time, as shown by the Wayback Machine: https://web.archive.org/web/20210703133618/https://www.python.org/dev/peps/pep-0008/

Q: Is this an error in the markup of PEP 8 itself that was forgiven by the old rendering pipeline but isn't forgiven by the new one, or is it a bug in the new rendering pipeline?

A: It's a bug in the new Sphinx-based rendering pipeline. I wasn't sure at first since I'm not too familiar with RST, so I checked the docs, and several things point towards the idea that newlines in inline literals should not be preserved:

  • The user quick-reference at https://docutils.sourceforge.io/docs/user/rst/quickref.html explicitly states, about inline literals, that "Spaces should be preserved, but line breaks will not be."
  • The name "inline literal" implies that they are for using inline in paragraphs and thus that line breaks should not be preserved
  • The example of paragraph markup in the RST quick syntax overview features an inline literal that spans a line break, and would render incorrectly if that line break were preserved

There admittedly is a bit where the RST docs are more equivocal about this point, seemingly suggesting that newlines in inline literals may or may not be preserved, perhaps depending upon the implementation. From the inline literals docs:

Line breaks and sequences of whitespace characters are not protected in inline literals. Although a reStructuredText parser will preserve them in its output, the final representation of the processed document depends on the output formatter, thus the preservation of whitespace cannot be guaranteed. If the preservation of line breaks and/or other whitespace is important, literal blocks should be used.

Nonetheless, I think that nuance doesn't change much and that it's reasonable to round this off to "the markup in PEP 8 is fine, and the rendering pipeline is rendering it incorrectly".

Q: So is this a bug in Sphinx?

A: Nope. If you create a new Sphinx project (I followed the instructions at https://www.sphinx-doc.org/en/master/tutorial/getting-started.html#setting-up-your-project-and-development-environment), paste the paragraph about bare except clauses from the start of this issue into a docs file, and build it, it renders just fine:

screenshot demonstrating the above

Q: So what's going on? Does something custom in this project's build pipeline generate different HTML?

A: Nope. The HTML generated by the paragraph is character-for-character identical between how it renders in a new Sphinx project and how it renders at https://peps.python.org/pep-0008/. Specifically, it looks like this:

<p>A bare <code class="docutils literal notranslate"><span class="pre">except:</span></code> clause will catch SystemExit and
KeyboardInterrupt exceptions, making it harder to interrupt a
program with Control-C, and can disguise other problems.  If you
want to catch all exceptions that signal program errors, use
<code class="docutils literal notranslate"><span class="pre">except</span> <span class="pre">Exception:</span></code> (bare except is equivalent to <code class="docutils literal notranslate"><span class="pre">except</span>
<span class="pre">BaseException:</span></code>).</p>

Thus the difference is entirely down to custom CSS included on peps.python.org.

Q: So what's the guilty CSS rule?

A: It's the white-space: pre-wrap; rule applied to all code elements by style.css. Remove that rule, and everything is fixed.

Q: So why did that rule get added in the first place? Does it have a purpose?

I'm, uh, not entirely sure about this one. @AA-Turner added it in #1933. Initially pre-wrap for pres was brought in by commit AA-Turner@db6bab6, where it looks like @AA-Turner pasted in some styling that was previously being used on python.org (I see lots of similar styles in the Wayback Machine link I pasted earlier). Then in commit AA-Turner@fbbdbef, without explanation, the rule gets applied to code blocks too.

Q: So is the rule pointless, and should we just remove it?

A: Yes, unless I'm missing something.

Since the entire impact of making this change will be to make <code> elements whose text content contains a literal newline render differently, a good way to figure out the impact is to write a quick and dirty script to search for precisely such elements. I did so as follows:

import os
import re

code_els_with_newlines = []

for filename in os.listdir('build'):
    if not filename.endswith('.html'):
        continue
    content = open(f"build/{filename}").read()
    matches = re.findall('<code.+?</code>', content, re.MULTILINE | re.DOTALL)
    for match in matches:
        if '\n' in match:
            code_els_with_newlines.append(match)

for el in code_els_with_newlines:
    print(el)
    print('\n\n--------------------------------------\n\n')

I've attached its output: affected-code-elements.txt

There are 199 such elements, and based on a skim of all 199 and a careful look at a dozen or so, they're basically all like the example I gave at the start of this issue - i.e. inline literals that span a line break in the RST markup, whose rendering is broken currently but is fixed if the pre-wrap rule is removed.

The only possible exception I found (which I looked at closely because it spans three lines) was from PEP 0368, which has markup like this:

| ``PyObject* PyImage_FromBuffer(PyObject *buffer, PyObject *mode,
                                 Py_ssize_t width,
                                 Py_ssize_t height)``

    Returns a new ``Image`` instance, initialized with the contents of
    the ``buffer`` object.

Here it seems like the author has tried to prettily format this function signature across multiple lines, but because they've not used a block literal, the correct rendering (as I understand the RST spec) is to throw away their whitespace and put everything on one line. That's possibly not what the author intended. However, it's still prettier than how we render it today, as shown by this before image...

image

... and after image:

image

I therefore think we can drop this rule with no downsides and fix 199 rendering bugs. I'll open a PR shortly to do so.

ExplodingCabbage added a commit to ExplodingCabbage/peps that referenced this issue Apr 10, 2022
As written up in detail at python#2511, this rule breaks the rendering of 199 inline literals across all the PEPs, and appears to have no reason to exist. I therefore propose to remove it.
@AA-Turner
Copy link
Member

Perhaps the most detailed bug report I've read! Thanks for the detail Mark -- the change looks good, and the pre element has its own whitespace: directive on L123 of the same file, so I think it should be safe to remove.

I've approved the PR, but won't merge as I don't have time to test properly whilst I'm away for a few weeks (I only caught this as I got a ping notification from my phone!)

A

@AA-Turner AA-Turner added the bug label Apr 10, 2022
JelleZijlstra pushed a commit that referenced this issue Apr 11, 2022
#2512)

As written up in detail at #2511, this rule breaks the rendering of 199 inline literals across all the PEPs, and appears to have no reason to exist. I therefore propose to remove it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants