Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitelist the allowed CSS classes #120

Closed
wants to merge 2 commits into from
Closed
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 57 additions & 1 deletion readme_renderer/clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@
import bleach.callbacks
import bleach.linkifier
import bleach.sanitizer
import html5lib.filters.base
import pygments.token


ALLOWED_TAGS = [
Expand Down Expand Up @@ -54,21 +56,75 @@
"width", "height",
]

ALLOWED_CLASSES = {
"img": ["align-left", "align-center", "align-right"],
"span": [c for c in pygments.token.STANDARD_TYPES.values() if c],
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  • I'm not sure if there is any use of a class inside of a span that isn't there to enable pygments, or if the list of pygments classes is all we need to worry about here.
  • @theacodes has set the <code> element to allow classes in the PR that added syntax highlighting, but I can't determine what classes are valid for the <code> element. Trying different combinations of things doesn't seem to actually generate code that emits a class on the <code> element. If we can't figure out how to make a README emit a <code> element with css classes, we should probably emit class from <code>.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you trying just with RST? I think I added it for Markdown, not sure though.

}


class _CSSClassFilter(html5lib.filters.base.Filter):
def __init__(self, *args, allowed_classes=None, **kwargs):
super().__init__(*args, **kwargs)

if allowed_classes is None:
allowed_classes = {}
self.allowed_classes = allowed_classes

def __iter__(self):
for token in super().__iter__():
token = self.sanitize_token(token)
if token:
yield token

def sanitize_token(self, token):
if token["type"] in {"StartTag", "EndTag", "EmptyTag"}:
name = token["name"]

if "data" in token:
attrs = token["data"]

if (None, "class") in attrs:
new_classes = self.sanitize_css_classes(
name,
attrs[(None, "class")]
)

if new_classes:
attrs[(None, "class")] = new_classes
else:
del attrs[(None, "class")]

token["data"] = attrs

return token

def sanitize_css_classes(self, name, classes):
classes = classes.split()
allowed = set(self.allowed_classes.get(name, []))
classes = sorted(set(classes) & allowed)
return " ".join(classes)


def clean(html, tags=None, attributes=None, styles=None):
def clean(html, tags=None, attributes=None, styles=None, classes=None):
if tags is None:
tags = ALLOWED_TAGS
if attributes is None:
attributes = ALLOWED_ATTRIBUTES
if styles is None:
styles = ALLOWED_STYLES
if classes is None:
classes = ALLOWED_CLASSES

# Clean the output using Bleach
cleaner = bleach.sanitizer.Cleaner(
tags=tags,
attributes=attributes,
styles=styles,
filters=[
# Bleach by default doesn't allow whitelisting what CSS classes
# are available to be used, so we'll override that behavior with
# our own filter which does.
functools.partial(_CSSClassFilter, allowed_classes=classes),
# Bleach Linkify makes it easy to modify links, however, we will
# not be using it to create additional links.
functools.partial(
Expand Down
5 changes: 5 additions & 0 deletions tests/test_clean.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,3 +3,8 @@

def test_invalid_link():
assert clean('<a href="http://exam](ple.com">foo</a>') == "<a>foo</a>"


def test_css_sanitizer():
r = clean("<span class='foo'><img class='align-right bar'></span>")
assert r == '<span><img class="align-right"></span>'