Skip to content

Commit

Permalink
feat: experimental implementation of pattern matching
Browse files Browse the repository at this point in the history
supporting classes:

- XML::Attr
- XML::Document
- XML::DocumentFragment
- XML::Namespace
- XML::Node
- XML::NodeSet

and their subclasses.

See #2360 for discussion and to provide feedback.
  • Loading branch information
flavorjones committed Nov 22, 2022
1 parent 2b56727 commit 40ad854
Show file tree
Hide file tree
Showing 11 changed files with 611 additions and 6 deletions.
1 change: 1 addition & 0 deletions .rubocop.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ AllCops:
- 'lib/nokogiri/css/parser.rb' # generated by racc
- 'lib/nokogiri/css/tokenizer.rb' # generated by rex
- 'lib/nokogiri/jruby/nokogiri_jars.rb' # generated by jar-dependencies
- 'test/_test_pattern_matching.rb' # until TargetRubyVersion >= 3.0
TargetRubyVersion: "2.6"
Naming/MethodName:
Enabled: false
Expand Down
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,22 @@ This version of Nokogiri ships official native gem support for the `aarch64-linu
This version of Nokogiri ships experimental native gem support for the `arm-linux` platform. Please note that glibc >= 2.29 is required for arm-linux systems, see [Supported Platforms](https://nokogiri.org/#supported-platforms) for more information.


#### Experimental pattern matching support

This version introduces an experimental pattern matching API for `XML::Attr`, `XML::Document`, `XML::DocumentFragment`, `XML::Namespace`, `XML::Node`, and `XML::NodeSet` (and their subclasses).

Some documentation on what can be matched:

- [`XML::Attr#deconstruct_keys`](https://nokogiri.org/rdoc/Nokogiri/XML/Attr.html?h=deconstruct#method-i-deconstruct_keys)
- [`XML::Document#deconstruct_keys`](keys://nokogiri.org/rdoc/Nokogiri/XML/Document.html?h=deconstruct#method-i-deconstruct_keys)
- [`XML::Namespace#deconstruct_keys`](https://nokogiri.org/rdoc/Nokogiri/XML/Namespace.html?h=deconstruct+namespace#method-i-deconstruct_keys)
- [`XML::Node#deconstruct_keys`](https://nokogiri.org/rdoc/Nokogiri/XML/Node.html?h=deconstruct#method-i-deconstruct_keys)
- [`XML::DocumentFragment#deconstruct`](https://nokogiri.org/rdoc/Nokogiri/XML/DocumentFragment.html?h=deconstruct#method-i-deconstruct)
- [`XML::NodeSet#deconstruct`](https://nokogiri.org/rdoc/Nokogiri/XML/NodeSet.html?h=deconstruct#method-i-deconstruct)

We welcome feedback on this API at [#2360](https://github.com/sparklemotion/nokogiri/issues/2360).


#### Maven-managed JRuby dependencies

This version of Nokogiri uses [`jar-dependencies`](https://github.com/mkristian/jar-dependencies) to manage most of the vendored Java dependencies. `nokogiri -v` now outputs maven metadata for all Java dependencies, and `Nokogiri::VERSION_INFO` also contains this metadata. [[#2432](https://github.com/sparklemotion/nokogiri/issues/2432)]
Expand Down
44 changes: 38 additions & 6 deletions ext/nokogiri/xml_namespace.c
Original file line number Diff line number Diff line change
Expand Up @@ -74,10 +74,26 @@ static const rb_data_type_t nokogiri_xml_namespace_type_without_dealloc = {
};

/*
* call-seq:
* prefix
* :call-seq:
* prefix() → String or nil
*
* Get the prefix for this namespace. Returns +nil+ if there is no prefix.
* Return the prefix for this Namespace, or +nil+ if there is no prefix (e.g., default namespace).
*
* *Example*
*
* doc = Nokogiri::XML.parse(<<~XML)
* <?xml version="1.0"?>
* <root xmlns="http://nokogiri.org/ns/default" xmlns:noko="http://nokogiri.org/ns/noko">
* <child1 foo="abc" noko:bar="def"/>
* <noko:child2 foo="qwe" noko:bar="rty"/>
* </root>
* XML
*
* doc.root.elements.first.namespace.prefix
* # => nil
*
* doc.root.elements.last.namespace.prefix
* # => "noko"
*/
static VALUE
prefix(VALUE self)
Expand All @@ -91,10 +107,26 @@ prefix(VALUE self)
}

/*
* call-seq:
* href
* :call-seq:
* href() → String
*
* Returns the URI reference for this Namespace.
*
* *Example*
*
* doc = Nokogiri::XML.parse(<<~XML)
* <?xml version="1.0"?>
* <root xmlns="http://nokogiri.org/ns/default" xmlns:noko="http://nokogiri.org/ns/noko">
* <child1 foo="abc" noko:bar="def"/>
* <noko:child2 foo="qwe" noko:bar="rty"/>
* </root>
* XML
*
* doc.root.elements.first.namespace.href
* # => "http://nokogiri.org/ns/default"
*
* Get the href for this namespace
* doc.root.elements.last.namespace.href
* # => "http://nokogiri.org/ns/noko"
*/
static VALUE
href(VALUE self)
Expand Down
49 changes: 49 additions & 0 deletions lib/nokogiri/xml/attr.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# coding: utf-8
# frozen_string_literal: true

module Nokogiri
Expand All @@ -7,6 +8,54 @@ class Attr < Node
alias_method :to_s, :content
alias_method :content=, :value=

#
# :call-seq: deconstruct_keys(array_of_names) → Hash
#
# Returns a hash describing the Attr, to use in pattern matching.
#
# Valid keys and their values:
# - +name+ → (String) The name of the attribute.
# - +value+ → (String) The value of the attribute.
# - +namespace+ → (Namespace, nil) The Namespace of the attribute, or +nil+ if there is no namespace.
#
# ⚡ This is an experimental feature, available since v1.14.0
#
# *Example*
#
# doc = Nokogiri::XML.parse(<<~XML)
# <?xml version="1.0"?>
# <root xmlns="http://nokogiri.org/ns/default" xmlns:noko="http://nokogiri.org/ns/noko">
# <child1 foo="abc" noko:bar="def"/>
# </root>
# XML
#
# attributes = doc.root.elements.first.attribute_nodes
# # => [#(Attr:0x35c { name = "foo", value = "abc" }),
# # #(Attr:0x370 {
# # name = "bar",
# # namespace = #(Namespace:0x384 {
# # prefix = "noko",
# # href = "http://nokogiri.org/ns/noko"
# # }),
# # value = "def"
# # })]
#
# attributes.first.deconstruct_keys([:name, :value, :namespace])
# # => {:name=>"foo", :value=>"abc", :namespace=>nil}
#
# attributes.last.deconstruct_keys([:name, :value, :namespace])
# # => {:name=>"bar",
# # :value=>"def",
# # :namespace=>
# # #(Namespace:0x384 {
# # prefix = "noko",
# # href = "http://nokogiri.org/ns/noko"
# # })}
#
def deconstruct_keys(keys)
{ name: name, value: value, namespace: namespace }
end

private

def inspect_attributes
Expand Down
44 changes: 44 additions & 0 deletions lib/nokogiri/xml/document.rb
Original file line number Diff line number Diff line change
Expand Up @@ -415,6 +415,50 @@ def xpath_doctype
Nokogiri::CSS::XPathVisitor::DoctypeConfig::XML
end

#
# :call-seq: deconstruct_keys(array_of_names) → Hash
#
# Returns a hash describing the Document, to use in pattern matching.
#
# Valid keys and their values:
# - +root+ → (Node, nil) The root node of the Document, or +nil+ if the document is empty.
#
# In the future, other keys may allow accessing things like doctype and processing
# instructions. If you have a use case and would like this functionality, please let us know
# by opening an issue or a discussion on the github project.
#
# ⚡ This is an experimental feature, available since v1.14.0
#
# *Example*
#
# doc = Nokogiri::XML.parse(<<~XML)
# <?xml version="1.0"?>
# <root>
# <child>
# </root>
# XML
#
# doc.deconstruct_keys([:root])
# # => {:root=>
# # #(Element:0x35c {
# # name = "root",
# # children = [
# # #(Text "\n" + " "),
# # #(Element:0x370 { name = "child", children = [ #(Text "\n")] }),
# # #(Text "\n")]
# # })}
#
# *Example* of an empty document
#
# doc = Nokogiri::XML::Document.new
#
# doc.deconstruct_keys([:root])
# # => {:root=>nil}
#
def deconstruct_keys(keys)
{ root: root }
end

private

IMPLIED_XPATH_CONTEXTS = ["//"].freeze # :nodoc:
Expand Down
47 changes: 47 additions & 0 deletions lib/nokogiri/xml/document_fragment.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# coding: utf-8
# frozen_string_literal: true

module Nokogiri
Expand Down Expand Up @@ -144,6 +145,52 @@ def fragment(data)
document.fragment(data)
end

#
# :call-seq: deconstruct() → Array
#
# Returns the root nodes of this document fragment as an array, to use in pattern matching.
#
# 💡 Note that text nodes are returned as well as elements. If you wish to operate only on
# root elements, you should deconstruct the array returned by
# <tt>DocumentFragment#elements</tt>.
#
# ⚡ This is an experimental feature, available since v1.14.0
#
# *Example*
#
# frag = Nokogiri::HTML5.fragment(<<~HTML)
# <div>Start</div>
# This is a <a href="#jump">shortcut</a> for you.
# <div>End</div>
# HTML
#
# frag.deconstruct
# # => [#(Element:0x35c { name = "div", children = [ #(Text "Start")] }),
# # #(Text "\n" + "This is a "),
# # #(Element:0x370 {
# # name = "a",
# # attributes = [ #(Attr:0x384 { name = "href", value = "#jump" })],
# # children = [ #(Text "shortcut")]
# # }),
# # #(Text " for you.\n"),
# # #(Element:0x398 { name = "div", children = [ #(Text "End")] }),
# # #(Text "\n")]
#
# *Example* only the elements, not the text nodes.
#
# frag.elements.deconstruct
# # => [#(Element:0x35c { name = "div", children = [ #(Text "Start")] }),
# # #(Element:0x370 {
# # name = "a",
# # attributes = [ #(Attr:0x384 { name = "href", value = "#jump" })],
# # children = [ #(Text "shortcut")]
# # }),
# # #(Element:0x398 { name = "div", children = [ #(Text "End")] })]
#
def deconstruct
children.to_a
end

private

# fix for issue 770
Expand Down
42 changes: 42 additions & 0 deletions lib/nokogiri/xml/namespace.rb
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
# coding: utf-8
# frozen_string_literal: true

module Nokogiri
Expand All @@ -6,6 +7,47 @@ class Namespace
include Nokogiri::XML::PP::Node
attr_reader :document

#
# :call-seq: deconstruct_keys(array_of_names) → Hash
#
# Returns a hash describing the Namespace, to use in pattern matching.
#
# Valid keys and their values:
# - +prefix+ → (String, nil) The namespace's prefix, or +nil+ if there is no prefix (e.g., default namespace).
# - +href+ → (String) The namespace's URI
#
# ⚡ This is an experimental feature, available since v1.14.0
#
# *Example*
#
# doc = Nokogiri::XML.parse(<<~XML)
# <?xml version="1.0"?>
# <root xmlns="http://nokogiri.org/ns/default" xmlns:noko="http://nokogiri.org/ns/noko">
# <child1 foo="abc" noko:bar="def"/>
# <noko:child2 foo="qwe" noko:bar="rty"/>
# </root>
# XML
#
# doc.root.elements.first.namespace
# # => #(Namespace:0x35c { href = "http://nokogiri.org/ns/default" })
#
# doc.root.elements.first.namespace.deconstruct_keys([:prefix, :href])
# # => {:prefix=>nil, :href=>"http://nokogiri.org/ns/default"}
#
# doc.root.elements.last.namespace
# # => #(Namespace:0x370 {
# # prefix = "noko",
# # href = "http://nokogiri.org/ns/noko"
# # })
#
# doc.root.elements.last.namespace.deconstruct_keys([:prefix, :href])
# # => {:prefix=>"noko", :href=>"http://nokogiri.org/ns/noko"}
#
#
def deconstruct_keys(keys)
{ prefix: prefix, href: href }
end

private

def inspect_attributes
Expand Down
63 changes: 63 additions & 0 deletions lib/nokogiri/xml/node.rb
Original file line number Diff line number Diff line change
Expand Up @@ -1403,6 +1403,69 @@ def canonicalize(mode = XML::XML_C14N_1_0, inclusive_namespaces = nil, with_comm
end
end

DECONSTRUCT_KEYS = [:name, :attributes, :children, :namespace, :content, :elements, :inner_html].freeze # :nodoc:
DECONSTRUCT_METHODS = { attributes: :attribute_nodes }.freeze # :nodoc:

#
# :call-seq: deconstruct_keys(array_of_names) → Hash
#
# Returns a hash describing the Node, to use in pattern matching.
#
# Valid keys and their values:
# - +name+ → (String) The name of this node, or "text" if it is a Text node.
# - +namespace+ → (Namespace, nil) The namespace of this node, or nil if there is no namespace.
# - +attributes+ → (Array<Attr>) The attributes of this node.
# - +children+ → (Array<Node>) The children of this node. 💡 Note this includes text nodes.
# - +elements+ → (Array<Node>) The child elements of this node. 💡 Note this does not include text nodes.
# - +content+ → (String) The contents of all the text nodes in this node's subtree. See #content.
# - +inner_html+ → (String) The inner markup for the children of this node. See #inner_html.
#
# ⚡ This is an experimental feature, available since v1.14.0
#
# *Example*
#
# doc = Nokogiri::XML.parse(<<~XML)
# <?xml version="1.0"?>
# <parent xmlns="http://nokogiri.org/ns/default" xmlns:noko="http://nokogiri.org/ns/noko">
# <child1 foo="abc" noko:bar="def">First</child1>
# <noko:child2 foo="qwe" noko:bar="rty">Second</noko:child2>
# </parent>
# XML
#
# doc.root.deconstruct_keys([:name, :namespace])
# # => {:name=>"parent",
# # :namespace=>
# # #(Namespace:0x35c { href = "http://nokogiri.org/ns/default" })}
#
# doc.root.deconstruct_keys([:inner_html, :content])
# # => {:content=>"\n" + " First\n" + " Second\n",
# # :inner_html=>
# # "\n" +
# # " <child1 foo=\"abc\" noko:bar=\"def\">First</child1>\n" +
# # " <noko:child2 foo=\"qwe\" noko:bar=\"rty\">Second</noko:child2>\n"}
#
# doc.root.elements.first.deconstruct_keys([:attributes])
# # => {:attributes=>
# # [#(Attr:0x370 { name = "foo", value = "abc" }),
# # #(Attr:0x384 {
# # name = "bar",
# # namespace = #(Namespace:0x398 {
# # prefix = "noko",
# # href = "http://nokogiri.org/ns/noko"
# # }),
# # value = "def"
# # })]}
#
def deconstruct_keys(keys)
requested_keys = DECONSTRUCT_KEYS & keys
{}.tap do |values|
requested_keys.each do |key|
method = DECONSTRUCT_METHODS[key] || key
values[key] = send(method)
end
end
end

# :section:

protected
Expand Down
Loading

0 comments on commit 40ad854

Please sign in to comment.