Skip to content
This repository has been archived by the owner on Aug 26, 2023. It is now read-only.

Commit

Permalink
feat: Nokogumbo detects Nokogiri's HTML5 API
Browse files Browse the repository at this point in the history
Closes #170

A future version of Nokogiri will provide Nokogumbo's API (see
sparklemotion/nokogiri#2204). This change
will allow Nokogumbo to detect whether Nokogiri provides the HTML5 API
and whether to use Nokogiri's implementation or Nokogumbo's
implementation.

Some contractual assumptions I'm making about Nokogiri:

- Nokogiri will faithfully reproduce the `::Nokogiri::HTML5` singleton
  method, module, and namespace (including classes
  `Nokogiri::HTML5::Node`, `Nokogiri::HTML5::Document`, and
  `Nokogiri::HTML5::DocumentFragment`)

- Nokogiri will not provide a `::Nokogumbo` module/namespace, but will
  provide a similar `::Nokogiri::Gumbo` module which will provide the
  same public API as `::Nokogumbo`.

This change checks for the existence of `Nokogiri::HTML5`,
`Nokogiri::Gumbo`, and an expected singleton method on each. We could
do a more- or less-thorough check here.

This change also provides an "escape hatch" using an environment
variable `NOKOGUMBO_IGNORE_NOKOGIRI_HTML5` which can be set to force
Nokogumbo to use its own implementation. This escape hatch might be
unnecessary, but this change is invasive enough to make me want to be
cautious.

Nokogumbo will emit a single warning message at `require`-time when it
is uses Nokogiri's implementation. This message points users to
sparklemotion/nokogiri#2205 which will
explain what's going on and help people migrate their
applications (but is an empty placeholder right now).
  • Loading branch information
flavorjones committed Mar 15, 2021
1 parent 80485db commit dc68896
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 11 deletions.
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,9 @@ and this project adheres to [Semantic Versioning](http://semver.org/spec/v2.0.0.
- Support Mageia distros when libxml2/libxslt system libraries are install. #165 (Thank you,
@pterjan!)

### Addded
- Forward-looking support for a version of Nokogiri that will provide HTML5 parsing. #171

### Improved
- Update extconf.rb to use Nokogiri v1.11's CPPFLAGS for more reliable installation.

Expand Down
32 changes: 21 additions & 11 deletions lib/nokogumbo.rb
Original file line number Diff line number Diff line change
@@ -1,17 +1,27 @@
require 'nokogiri'
require 'nokogumbo/version'
require 'nokogumbo/html5'

require 'nokogumbo/nokogumbo'
if ((defined?(Nokogiri::HTML5) && Nokogiri::HTML5.respond_to?(:parse)) &&
(defined?(Nokogiri::Gumbo) && Nokogiri::Gumbo.respond_to?(:parse)) &&
!(ENV.key?("NOKOGUMBO_IGNORE_NOKOGIRI_HTML5") && ENV["NOKOGUMBO_IGNORE_NOKOGIRI_HTML5"] != "false"))

warn "NOTE: nokogumbo: Using Nokogiri::HTML5 provided by Nokogiri. See https://github.com/sparklemotion/nokogiri/issues/2205 for more information."

::Nokogumbo = ::Nokogiri::Gumbo
else
require 'nokogumbo/html5'
require 'nokogumbo/nokogumbo'

module Nokogumbo
# The default maximum number of attributes per element.
DEFAULT_MAX_ATTRIBUTES = 400
module Nokogumbo
# The default maximum number of attributes per element.
DEFAULT_MAX_ATTRIBUTES = 400

# The default maximum number of errors for parsing a document or a fragment.
DEFAULT_MAX_ERRORS = 0
# The default maximum number of errors for parsing a document or a fragment.
DEFAULT_MAX_ERRORS = 0

# The default maximum depth of the DOM tree produced by parsing a document
# or fragment.
DEFAULT_MAX_TREE_DEPTH = 400
# The default maximum depth of the DOM tree produced by parsing a document
# or fragment.
DEFAULT_MAX_TREE_DEPTH = 400
end
end

require 'nokogumbo/version'

0 comments on commit dc68896

Please sign in to comment.