Skip to content

Releases: rgrove/sanitize

Version 3.1.0 (2014-12-22)

23 Dec 01:29
Compare
Choose a tag to compare
  • Added the following CSS properties to the relaxed config. @ehudc - #120
    • -moz-text-size-adjust
    • -ms-text-size-adjust
    • -webkit-text-size-adjust
    • text-size-adjust
  • Updated Nokogumbo to 1.2.0 to pick up a fix for a Gumbo bug where the entity Æ left its semicolon behind when it was converted to a character during parsing. #119

Version 3.0.4 (2014-12-12)

12 Dec 23:26
Compare
Choose a tag to compare
  • Fixed: Harmless whitespace preceding a URL protocol (such as " http://") caused the URL to be removed even when the protocol was whitelisted. @benubois - #126

Version 3.0.3 (2014-10-29)

29 Oct 22:48
Compare
Choose a tag to compare
  • Fixed: Some CSS selectors weren't parsed correctly inside the body of a @media block, causing them to be removed even when whitelist rules should have allowed them to remain. #121

Version 3.0.2 (2014-09-02)

03 Sep 00:38
Compare
Choose a tag to compare
  • Updated Nokogumbo to 1.1.12, because 1.1.11 silently reverted the change we were trying to pick up in the last release. Now issue #114 is actually fixed.

Version 3.0.1 (2014-09-02)

03 Sep 00:25
Compare
Choose a tag to compare
  • Updated Nokogumbo to 1.1.11 to pick up a fix for a Gumbo bug in which certain HTML character entities, such as Ö, were parsed incorrectly, leaving the semicolon behind in the output. #114

Version 3.0.0 (2014-06-21)

21 Jun 23:16
Compare
Choose a tag to compare

As of this version, Sanitize adheres strictly to the SemVer 2.0.0 versioning standard. This release contains API and output changes that are incompatible with previous releases, as indicated by the major version increment.

Backwards-incompatible changes

  • HTML is now parsed using Google's Gumbo HTML5 parser, which adheres to the HTML5 parsing spec and behaves much more like modern browser parsers than the previous libxml2-based parser. As a result, HTML output may differ from that of previous versions of Sanitize.
  • All transformers now traverse the document from the top down, starting with the first node, then its first child, and so on. The :transformers_breadth config has been removed, and old bottom-up transformers (the previous default) may need to be rewritten.
  • Sanitize's built-in configs are now deeply frozen to prevent people from modifying them (either accidentally or maliciously). To customize a built-in config, create a new copy using Sanitize::Config.merge(), like so:
Sanitize.fragment(html, Sanitize::Config.merge(Sanitize::Config::BASIC,
  :elements        => Sanitize::Config::BASIC[:elements] + ['div', 'table'],
  :remove_contents => true
))
  • The clean! and clean_document! methods were removed, since they weren't useful and tended to confuse people.
  • The clean method was renamed to fragment to more clearly indicate that its intended use is to sanitize an HTML fragment.
  • The clean_document method was renamed to document.
  • The clean_node! method was renamed to node!.
  • The document method now raises a Sanitize::Error if the <html> element isn't whitelisted, rather than a RuntimeError. This error is also now raised regardless of the :remove_contents config setting.
  • The :output config has been removed. Output is now always HTML, not XHTML.
  • The :output_encoding config has been removed. Output is now always UTF-8.

Other changes

  • Added advanced CSS sanitization support using Crass, which is fully compliant with the CSS Syntax Module Level 3 parsing spec. The contents of whitelisted <style> elements and style attributes in HTML will be sanitized as CSS, or you can use the Sanitize::CSS class to manually sanitize CSS stylesheets or properties.
  • Added an :allow_doctype setting. When true, well-formed doctype definitions will be allowed in documents. When false (the default), doctype definitions will be removed from documents. Doctype definitions are never allowed in fragments, regardless of this setting.
  • Added the following elements to the relaxed config, in addition to various attributes: article, aside, body, data, div, footer, head, header, html, main, nav, section, span, style, title.
  • The :whitespace_elements config is now a Hash, and allows you to specify the text that should be inserted before and after these elements when they're removed. The old-style Array-based config value is still supported for backwards compatibility. @alperkokmen - #94
  • Unsuitable Unicode characters are now removed from HTML before it's parsed. #106
  • Fixed: Non-tag brackets in input like "1 > 2 and 2 < 1" are now parsed and escaped correctly in accordance with the HTML5 spec, becoming "1 &gt; 2 and 2 &lt; 1". #83
  • Fixed: Siblings added after the current node during traversal are now also traversed. In previous versions they were simply skipped. #91
  • Fixed: Nokogiri has been smacked and instructed to stop adding newlines after certain elements, because if people wanted newlines there they'd have put them there, dammit. #103
  • Fixed: Added a workaround for a libxml2 bug that caused an undesired content-type meta tag to be added to all documents with <head> elements. Nokogiri #1008