Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix GH-121; Make attribute order deterministic #137

Merged

Conversation

msabramo
Copy link
Contributor

Always serialize attributes in alpha order by passing alphabetical_attributes=True to html5lib.serializer.HTMLSerializer.

This allows greatly simplifying tests within bleach itself and in code that makes use of bleach.

Fixes: GH-121

Cc: @metalculus84, @dstufft

@msabramo msabramo force-pushed the GH-121_make_attribute_order_deterministic branch from b34286f to bbe00c0 Compare December 12, 2014 07:27
@msabramo
Copy link
Contributor Author

Passing Travis CI build: https://travis-ci.org/jsocol/bleach/builds/43805108

@msabramo
Copy link
Contributor Author

$ tox
...
  py26: commands succeeded
  py27: commands succeeded
  py32: commands succeeded
  py33: commands succeeded
  pypy: commands succeeded
  congratulations :)

@jsocol
Copy link
Contributor

jsocol commented Dec 12, 2014

Thanks, @msabramo! Rather than making ordereddict a permanent dependency, especially since it's only for the oldest supported Python, can we do something like html5lib/html5lib-python#177? Mind doing that then squashing it all down to one commit?

@msabramo
Copy link
Contributor Author

Sure, that makes sense. Coming up shortly...

by passing `alphabetical_attributes=True` to
`html5lib.serializer.HTMLSerializer`

Fixes: mozillaGH-121
@msabramo msabramo force-pushed the GH-121_make_attribute_order_deterministic branch from bbe00c0 to 2df578f Compare December 12, 2014 14:45
@msabramo
Copy link
Contributor Author

Changed and squashed. Tested with:

[marca@marca-mac2 bleach]$ python2.6 setup.py egg_info > /dev/null && (grep ordereddict bleach.egg-info/requires.txt || echo "ordereddict not required")
ordereddict
[marca@marca-mac2 bleach]$ python2.7 setup.py egg_info > /dev/null && (grep ordereddict bleach.egg-info/requires.txt || echo "ordereddict not required")
ordereddict not required
[marca@marca-mac2 bleach]$ python3.3 setup.py egg_info > /dev/null && (grep ordereddict bleach.egg-info/requires.txt || echo "ordereddict not required")
ordereddict not required
[marca@marca-mac2 bleach]$ python3.4 setup.py egg_info > /dev/null && (grep ordereddict bleach.egg-info/requires.txt || echo "ordereddict not required")
ordereddict not required
[marca@marca-mac2 bleach]$ pypy setup.py egg_info > /dev/null && (grep ordereddict bleach.egg-info/requires.txt || echo "ordereddict not required")
ordereddict not required

@msabramo
Copy link
Contributor Author

Now I'm wondering if I should of made it a flag in case some people don't like their attribute order getting changed...

e.g.:

diff --git a/bleach/__init__.py b/bleach/__init__.py
index 08f0532..51ee396 100644
--- a/bleach/__init__.py
+++ b/bleach/__init__.py
@@ -95,7 +95,8 @@ DEFAULT_CALLBACKS = [linkify_callbacks.nofollow]


 def clean(text, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES,
-          styles=ALLOWED_STYLES, strip=False, strip_comments=True):
+          styles=ALLOWED_STYLES, strip=False, strip_comments=True,
+          alphabetical_attributes=True):
     """Clean an HTML fragment and return it"""
     if not text:
         return ''
@@ -111,7 +112,8 @@ def clean(text, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES,

     parser = html5lib.HTMLParser(tokenizer=s)

-    return _render(parser.parseFragment(text))
+    return _render(parser.parseFragment(text),
+                   alphabetical_attributes=alphabetical_attributes)


 def linkify(text, callbacks=DEFAULT_CALLBACKS, skip_pre=False,
@@ -364,15 +366,17 @@ def linkify(text, callbacks=DEFAULT_CALLBACKS, skip_pre=False,
     return _render(forest)


-def _render(tree):
+def _render(tree, alphabetical_attributes=True):
     """Try rendering as HTML, then XML, then give up."""
-    return force_unicode(_serialize(tree))
+    return force_unicode(
+        _serialize(tree,
+                   alphabetical_attributes=alphabetical_attributes))


-def _serialize(domtree):
+def _serialize(domtree, alphabetical_attributes=True):
     walker = html5lib.treewalkers.getTreeWalker('etree')
     stream = walker(domtree)
     serializer = HTMLSerializer(quote_attr_values=True,
-                                alphabetical_attributes=True,
+                                alphabetical_attributes=alphabetical_attributes,
                                 omit_optional_tags=False)
     return serializer.render(stream)
diff --git a/docs/clean.rst b/docs/clean.rst
index 2fb888b..5cd7e5b 100644
--- a/docs/clean.rst
+++ b/docs/clean.rst
@@ -8,7 +8,8 @@
 ``clean()`` is Bleach's HTML sanitization method::

     def clean(text, tags=ALLOWED_TAGS, attributes=ALLOWED_ATTRIBUTES,
-              styles=ALLOWED_STYLES, strip=False, strip_comments=True):
+              styles=ALLOWED_STYLES, strip=False, strip_comments=True,
+              alphabetical_attributes=True):
         """Clean an HTML fragment and return it."""

 Given a fragment of HTML, Bleach will parse it according to the HTML5 parsing

@jsocol: What do you think? Want me to throw that in there?

@jsocol
Copy link
Contributor

jsocol commented Dec 12, 2014

No, without alphabetical_attributes it changes the attribute order anyway and there are too many kwargs as-is. This is great, thanks @msabramo!

jsocol added a commit that referenced this pull request Dec 12, 2014
…erministic

Fix GH-121; Make attribute order deterministic
@jsocol jsocol merged commit ccccfdd into mozilla:master Dec 12, 2014
@msabramo
Copy link
Contributor Author

Great! You're welcome, @jsocol and thank you for creating bleach and sharing it!

@msabramo
Copy link
Contributor Author

Time for a new release on PyPI perhaps?

That would help folks have reliable test runs and I could avoid depending on git in my PR to pypa/readme ...

@jsocol
Copy link
Contributor

jsocol commented Dec 13, 2014

Oof, yeah it's been about a year. I'll get on that

msabramo added a commit to msabramo/readme_renderer that referenced this pull request Dec 17, 2014
The new version of bleach now sorts attributes alphabetically so that
the results are deterministic (see
mozilla/bleach#137)

so I had to reverse the order of attributes in the expected HTML.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

clean can randomize attribute order
2 participants