Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to select path segments using Python re regexprs; follow on to PR#146 & #177 #186

Open
wants to merge 17 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,10 @@ jobs:
uses: ymyzk/run-tox-gh-actions@main
with:
tox-args: -vv --hashseed=${{ needs.generate-hashseed.outputs.hashseed }}

- name: Run tox with tox-gh-actions (Regex feature flag)
uses: ymyzk/run-tox-gh-actions@main
env:
DPATH_ALLOW_REGEX: True
with:
tox-args: -vv --hashseed=${{ needs.generate-hashseed.outputs.hashseed }}
81 changes: 81 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -111,6 +111,9 @@ elements in ``x['a']['b']`` where the key is equal to the glob ``'[cd]'``. Okay.
}
}

**Note** : Using Python's `re` regular expressions instead of globs is explained
below re_regexp_.

... Wow that was easy. What if I want to iterate over the results, and
not get a merged view?

Expand Down Expand Up @@ -438,6 +441,84 @@ To get around this, you can sidestep the whole "filesystem path" style, and aban
>>> dpath.get(['a', 'b/c'])
0

.. _re_regexp:

Globs too imprecise? Use Python's `re` Regular Expressions
==========================================================

Python's `re` regular expressions PythonRe_ may be used as follows:

.. _PythonRe: https://docs.python.org/3/library/re.html

- The recognition of such regular expressions in strings is disabled by default, but may be easily
enabled ( Set up this way for backwards compatibility in the cases where a path
expression component would start with '{' and end in '}').
- Irrespective of this setting, the user can use `re` regular expressions in the list form of
paths (see below).

.. code-block:: python

>>> import dpath
>>> # enable
>>> dpath.options.ALLOW_REGEX = True
>>> # disable
>>> dpath.options.ALLOW_REGEX = False

- Now a path component may also be specified :

- in a path expression, as {<re.regexpr>} where `<re.regexpr>` is a regular expression
accepted by the standard Python module `re`. For example:

.. code-block:: python

>>> selPath = 'Config/{(Env|Cmd)}'
>>> x = dpath.search(js.lod, selPath)

.. code-block:: python

>>> selPath = '{(Config|Graph)}/{(Env|Cmd|Data)}'
>>> x = dpath.search(js.lod, selPath)

- When using the list form for a path, a list element can also
be expressed as

- a string as above
- the output of :: `re.compile( args )``

An example:

.. code-block:: python

>>> selPath = [ re.compile('(Config|Graph)') , re.compile('(Env|Cmd|Data)') ]
>>> x = dpath.search(js.lod, selPath)

More examples from a realistic json context:

+-----------------------------------------+--------------------------------------+
+ **Extended path glob** | **Designates** +
+-----------------------------------------+--------------------------------------+
+ "\*\*/{[^A-Za-z]{2}$}" | "Id" +
+-----------------------------------------+--------------------------------------+
+ r"\*/{[A-Z][A-Za-z\\d]*$}" | "Name","Id","Created", "Scope",... +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\d$}" | EnableIPv6" +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "Containers/199c5/MacAddress" +
+-----------------------------------------+--------------------------------------+

With Python's character string conventions, required backslashes in the `re` syntax
can be entered either in raw strings or using double backslashes, thus
the following are equivalent:

+-----------------------------------------+----------------------------------------+
+ *with raw strings* | *equivalent* with double backslash +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\\d$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*\\\\d$}" +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*Address$}" +
+-----------------------------------------+----------------------------------------+


dpath.segments : The Low-Level Backend
======================================

Expand Down
34 changes: 29 additions & 5 deletions dpath/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,31 +20,55 @@
"Creator",
]

import re
from collections.abc import MutableMapping, MutableSequence
from typing import Union, List, Any, Callable, Optional

from dpath import segments, options
from dpath.exceptions import InvalidKeyName, PathNotFound
from dpath.exceptions import InvalidKeyName, PathNotFound, InvalidRegex
from dpath.types import MergeType, PathSegment, Creator, Filter, Glob, Path, Hints

_DEFAULT_SENTINEL = object()


def _split_path(path: Path, separator: Optional[str] = "/") -> Union[List[PathSegment], PathSegment]:
def _split_path(path: Glob, separator: Optional[str] = "/") -> Union[List[PathSegment], PathSegment]:
"""
Given a path and separator, return a tuple of segments. If path is
already a non-leaf thing, return it.
Given a path and separator, return a tuple of segments.

If path is already a non-leaf thing, return it: this covers sequences of strings
and re.Patterns.

Note that a string path with the separator at index[0] will have the
separator stripped off. If you pass a list path, the separator is
ignored, and is assumed to be part of each key glob. It will not be
stripped.
stripped (i.e. a first list element can be an empty string).

If RegEx support is enabled then str segments which are wrapped with curly braces will be handled as regular
expressions. These segments will be compiled using re.compile.
Errors during RegEx compilation will raise an InvalidRegex exception.
"""
if not segments.leaf(path):
split_segments = path
elif isinstance(path, re.Pattern):
# Handle paths which are comprised of a single re.Pattern
split_segments = (path,)
else:
split_segments = path.lstrip(separator).split(separator)

if options.ALLOW_REGEX:
# Handle RegEx segments

def compile_regex_segment(segment: PathSegment):
if isinstance(segment, str) and segment.startswith("{") and segment.endswith("}"):
try:
return re.compile(segment[1:-1])
except re.error as re_err:
raise InvalidRegex(f"Could not compile RegEx in path segment '{segment}' ({re_err})")

return segment

split_segments = list(map(compile_regex_segment, split_segments))

return split_segments


Expand Down
11 changes: 8 additions & 3 deletions dpath/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@ class InvalidGlob(Exception):
pass


class InvalidRegex(Exception):
"""Invalid regular expression in path segment."""
pass


class PathNotFound(Exception):
"""One or more elements of the requested path did not exist in the object"""
"""One or more elements of the requested path did not exist in the object."""
pass


class InvalidKeyName(Exception):
"""This key contains the separator character or another invalid character"""
"""This key contains the separator character or another invalid character."""
pass


class FilteredValue(Exception):
"""Unable to return a value, since the filter rejected it"""
"""Unable to return a value, since the filter rejected it."""
pass
10 changes: 10 additions & 0 deletions dpath/options.py
Original file line number Diff line number Diff line change
@@ -1 +1,11 @@
from os import environ

ALLOW_EMPTY_STRING_KEYS = False

ALLOW_REGEX = "DPATH_ALLOW_REGEX" in environ
"""Enables regular expression support.

Enabling this feature will allow usage of regular expressions as part of paths.
Regular expressions must be wrapped in curly brackets. For example: "a/b/{[cd]}".
Expressions will be compiled using the standard library re.compile function.
"""
22 changes: 14 additions & 8 deletions dpath/segments.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from copy import deepcopy
from fnmatch import fnmatchcase
from re import Pattern
from typing import Sequence, Tuple, Iterator, Any, Union, Optional, MutableMapping, MutableSequence

from dpath import options
Expand Down Expand Up @@ -36,7 +37,7 @@ def leaf(thing):
"""
Return True if thing is a leaf, otherwise False.
"""
leaves = (bytes, str, int, float, bool, type(None))
leaves = (bytes, str, int, float, bool, type(None), Pattern)

return isinstance(thing, leaves)

Expand Down Expand Up @@ -182,9 +183,11 @@ def match(segments: Path, glob: Glob):
or more star segments and the type will be coerced to match that of
the segment.

A segment is considered to match a glob if the function
fnmatch.fnmatchcase returns True. If fnmatchcase returns False or
throws an exception the result will be False.
A segment is considered to match a glob when either:
- the segment is a String : the function fnmatch.fnmatchcase returns True.
If fnmatchcase returns False or throws an exception the result will be False.
- or, the segment is a re.Pattern (result of re.compile) and re.Pattern.match returns
a match

match(segments, glob) -> bool
"""
Expand Down Expand Up @@ -241,10 +244,13 @@ def match(segments: Path, glob: Glob):
s = str(s)

try:
# Let's see if the glob matches. We will turn any kind of
# exception while attempting to match into a False for the
# match.
if not fnmatchcase(s, g):
# Let's see if the glob or the regular expression matches. We will turn any kind of
# exception while attempting to match into a False for the match.
if isinstance(g, Pattern):
moomoohk marked this conversation as resolved.
Show resolved Hide resolved
mobj = g.match(s)
if mobj is None:
return False
elif not fnmatchcase(s, g):
return False
except:
return False
Expand Down
8 changes: 5 additions & 3 deletions dpath/types.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
from enum import IntFlag, auto
from typing import Union, Any, Callable, Sequence, Tuple, List, Optional, MutableMapping
from re import Pattern


class SymmetricInt(int):
Expand Down Expand Up @@ -46,18 +47,19 @@ class MergeType(IntFlag):
replaces the destination in this situation."""


PathSegment = Union[int, str, bytes]
PathSegment = Union[int, str, bytes, Pattern]
"""Type alias for dict path segments where integers are explicitly casted."""

Filter = Callable[[Any], bool]
"""Type alias for filter functions.

(Any) -> bool"""

Glob = Union[str, Sequence[str]]
Glob = Union[str, Pattern, Sequence[Union[str, Pattern]]]
"""Type alias for glob parameters."""

Path = Union[str, Sequence[PathSegment]]

Path = Union[str, Pattern, Sequence[PathSegment]]
"""Type alias for path parameters."""

Hints = Sequence[Tuple[PathSegment, type]]
Expand Down
Loading