Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow specific class values only #201

Closed
wants to merge 8 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 28 additions & 5 deletions readme_renderer/clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,9 +41,7 @@

# Custom Additions
"*": ["id"],
"hr": ["class"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to find out why hr was expected to contain a class. It was added at the same time when hr was authorized as a tag itself, by @dstufft in #21 . If you happen to remember what it was about, I'm interested. But I couldn't find a relevant hr.<somthing> rule in Warehouse CSS.

"img": ["src", "width", "height", "alt", "align", "class"],
"span": ["class"],
"img": ["src", "width", "height", "alt", "align"],
"th": ["align"],
"td": ["align"],
"h1": ["align"],
Expand All @@ -52,19 +50,44 @@
"h4": ["align"],
"h5": ["align"],
"h6": ["align"],
"code": ["class"],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have a <code class=language-python>print("hello")</code> in the output of the Markdown rendering, we expressly change it to pygmentize before the bleaching is done, so at bleaching time we should not expect that tag.

r'<pre><code class="language-(?P<lang>.+?)">(?P<code>.+?)'

"p": ["align"],
}
# Class is a specific attribute because not only do we want to allow it only
# on certain tags, but we also want to control possible values.
ALLOWED_CLASSES = {
"img": {"align-left", "align-right", "align-center"},
Copy link
Member Author

@ewjoachim ewjoachim Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you look at https://github.com/pypa/warehouse/blob/master/warehouse/static/sass/blocks/_project-description.scss#L207-L209, you may find a discrepancy:

Warehouse CSS alignment rule will trigger on 3 possible conditions:

  • img.align-left which is the one we target here
  • .figure.align-left which according to Styles for image alignment (pypa/warehouse#4023) pypi/warehouse#4391 (comment) will be applied on a div element, but it's already broken as of today because div element doesn't allow class without our current bleaching rules, even before this PR. This could be fixed in a subsequent PR
  • object.align-left but object tags are not even allowed!
    So authorizing only those classes and only on img seems to not be a restriction compared to what we have today.

"span": set(
# Classes for syntax coloring
# The original source for this list is
# https://github.com/pygments/pygments/blob/cfaa45dcc4103da8cf1700fd0d3e5708d894337b/pygments/token.py
# which is a superset from the list in
# https://github.com/pypa/warehouse/blob/master/warehouse/static/sass/blocks/_project-description.scss#L256
# This means that some classes are unused and it's most probably OK.
"bp c c1 ch cm cp cpf cs dl err esc fm g gd ge gh gi go gp gr gs gt "
"gu il k kc kd kn kp kr kt l ld m mb mf mh mi mo n na nb nc nd ne nf "
"ni nl nn no nt nv nx o ow p py s s1 s2 sa sb sc sd se sh si sr ss sx "
"vc vg vi vm w x".split()
Copy link
Member Author

@ewjoachim ewjoachim Jun 5, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wonder, if pygmetize gets new CSS classes, it's not a problem that we won't have them here unless we want to style them in Warehouse, and in this case, we'll have to add them both here and in Warehouse CSS.

)
}


ALLOWED_STYLES = [
]


def is_attributes_allowed(tag, name, value):
if name == "class":
# In our case, there's no use-case where a single element may have
# multiple classes, so we don't have to split() to compare.
return value in ALLOWED_CLASSES.get(tag, ())
return name in ALLOWED_ATTRIBUTES.get(tag, []) + ALLOWED_ATTRIBUTES["*"]


def clean(html, tags=None, attributes=None, styles=None):
if tags is None:
tags = ALLOWED_TAGS
if attributes is None:
attributes = ALLOWED_ATTRIBUTES
attributes = is_attributes_allowed
if styles is None:
styles = ALLOWED_STYLES

Expand Down
4 changes: 4 additions & 0 deletions tests/fixtures/test_GFM_024.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
<p><span>foo</span>
<span class="p">bar</span></p>
<img>
<img class="align-left">
5 changes: 5 additions & 0 deletions tests/fixtures/test_GFM_024.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
<span class="forbidden">foo</span>
<span class="p">bar</span>

<img class="forbidden">
<img class="align-left">
23 changes: 21 additions & 2 deletions tests/test_clean.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,24 @@
from readme_renderer.clean import clean
import pytest

from readme_renderer import clean


def test_invalid_link():
assert clean('<a href="http://exam](ple.com">foo</a>') == "<a>foo</a>"
assert clean.clean('<a href="http://exam](ple.com">foo</a>') == "<a>foo</a>"


@pytest.mark.parametrize(
"tag, name, value, expected",
[
("form", "align", "left", False), # form doesn't allow attributes
("h1", "align", "left", True), # h1 allows align attribute
("h1", "class", "align-left", False), # h1 doesn't allow class
("img", "onerror", "alert()", False), # img doesn't onerror attribute
("img", "class", "something", False), # img allows class but not this one
("img", "class", "align-left", True), # img allows this class
("img", "id", "some-id", True), # everything allows id
("form", "id", "some-id", True), # everything allows id
]
)
def test_is_attributes_allowed(tag, name, value, expected):
assert clean.is_attributes_allowed(tag, name, value) == expected
2 changes: 1 addition & 1 deletion tox.ini
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ commands =
extras = md

[testenv:pep8]
basepython = python3.6
basepython = python3
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CI was launched using python 3.x but this tox target wanted 3.6 specifically, so it couldn't work. Don't know why it did so far. If you want this in a different PR, let me know.

deps =
flake8
pep8-naming
Expand Down