Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to select path segments with user defined match; follow on to PRs #146 & #177 #182

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
2ba5326
Preparing for integration of PR
AlainLich Dec 9, 2022
0a9992d
After integration of regexp extention to paths
AlainLich Dec 12, 2022
7924d5f
Merge branch 'AL-PR-devel' into AL-master4merge
AlainLich Dec 12, 2022
23f491f
After integration into with update of master on Dec 5, 2022
AlainLich Dec 12, 2022
98cf08a
Modifications in python3Test.yml for current Python version and runni…
AlainLich Dec 12, 2022
18fc279
Corrects issues found running Github Action test-python3, documentati…
AlainLich Dec 13, 2022
1a36acc
Correction in README.rst, bumped Pypy version in tox.ini
AlainLich Dec 13, 2022
afc04ec
Merge branch 'master' into AL-master4merge
moomoohk Dec 17, 2022
932e9e6
Resolved Flake8 issues (in full module), added tool to help with sys.…
AlainLich Dec 19, 2022
e6a0d0d
Used yamllint on 2 action workflows, corrected errors
AlainLich Dec 20, 2022
ded5d26
Extends path search capabilities beyond handling re.regex by abstract…
AlainLich Dec 21, 2022
6b4cabd
Corrected tox.ini for testing, rapidfuzz is required for testing Stri…
AlainLich Dec 22, 2022
1c87332
Developped towards generic path component matching, beyond use of pyt…
AlainLich Jan 11, 2023
aac95f1
Merge branch 'AL-PR-devel' into AL-PR-4merge
AlainLich Jan 12, 2023
b260955
Adds support for re regular expressions in paths
AlainLich Apr 8, 2023
8a468c1
Add a workflow which is launched manually within selectable branch
AlainLich Apr 9, 2023
b5827d3
Corrected tests/test_various_exts.py pypy-3.7 incompat.
AlainLich Apr 9, 2023
bccb6d0
Adds support for re regular expressions in paths
AlainLich Apr 9, 2023
d401b90
Add support for generalized segment matching in paths
AlainLich Apr 11, 2023
0257999
Merge branch 'AL-full-restruct' into AL-PR-devel
AlainLich Apr 11, 2023
41e1115
Cleanup preparing for remote update
AlainLich Apr 11, 2023
12cf369
Documentation correction
AlainLich Apr 12, 2023
b568d51
(Re-)Integrated on top of changes in re.regexp only PR
AlainLich May 3, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions .github/workflows/python3Test.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
name: Test python package dpath-python with regexp extension
# ------------------------------------------------------------
# (C) Alain Lichnewsky, 2021, 2022, 2023
#
# For running under Github's Actions
#
# ------------------------------------------------------------

# *****************************
# ADDED FOR TESTING PRIOR TO PR
# REMOVE FROM PR submission
# *****************************

on:
workflow_dispatch:
# Allows manual dispatch from the Actions tab

jobs:
test-python3:

timeout-minutes: 60

runs-on: ubuntu-latest

strategy:
matrix:
# Match versions specified in tox.ini and tox-set-rex.ini
python-version: ['3.8', '3.11', 'pypy3.7', 'pypy3.9']

steps:
- name: Checkout code
uses: actions/checkout@main

- name: Set up Python ${{ matrix.python-version }}
uses: actions/setup-python@main
with:
python-version: ${{ matrix.python-version }}
architecture: 'x64'

- name: Ascertain configuration
#
# Collect information concerning $HOME and the location of
# file(s) loaded from Github/
run: |
echo Working dir: $(pwd)
echo Files at this location:
ls -ltha
echo HOME: ${HOME}
echo LANG: ${LANG} SHELL: ${SHELL}
which python
echo LD_LIBRARY_PATH: ${LD_LIBRARY_PATH}
echo PYTHONPATH: \'${PYTHONPATH}\'

- name: Install dependencies
shell: bash
if: always()
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# requirements install the test framework, which is not
# required by the package in setup.py
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
run: |
python -m pip install --upgrade pip setuptools wheel \
nose2 hypothesis
if [ -f requirements.txt ]; then
pip install -r requirements.txt;
fi
python setup.py install
pip install tox
echo "Installed tox"

- name: Tox test with default DPATH_ALLOW_REGEX not set
shell: bash
if: always()
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# tox testing, here tox.ini is used
# DPATH_ALLOW_REGEX not set
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
run: |
tox
echo "Ran tox"

- name: Tox test with DPATH_ALLOW_REGEX = TRUE
shell: bash
if: always()
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
# tox testing, here tox-set-rex.ini is used
# DPATH_ALLOW_REGEX = TRUE
# ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
run: |
tox -c tox-set-rex.ini
echo "Ran tox -c tox-set-rex.ini"

12 changes: 0 additions & 12 deletions .gitignore

This file was deleted.

166 changes: 164 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ A python library for accessing and searching dictionaries via

Basically it lets you glob over a dictionary as if it were a filesystem.
It allows you to specify globs (ala the bash eglob syntax, through some
advanced fnmatch.fnmatch magic) to access dictionary elements, and
provides some facility for filtering those results.
advanced fnmatch.fnmatch magic, or using Python's `re`regular expressions )
to access dictionary elements, and provides some facility for filtering those results.

sdists are available on pypi: http://pypi.python.org/pypi/dpath

Expand Down Expand Up @@ -111,6 +111,9 @@ elements in ``x['a']['b']`` where the key is equal to the glob ``'[cd]'``. Okay.
}
}

**Note** : Using Python's `re` regular expressions instead of globs is explained
below re_regexp_; defining your own string matcher objects is shown in generalized_string_match_ below.

... Wow that was easy. What if I want to iterate over the results, and
not get a merged view?

Expand Down Expand Up @@ -438,6 +441,165 @@ To get around this, you can sidestep the whole "filesystem path" style, and aban
>>> dpath.get(['a', 'b/c'])
0

.. _re_regexp:

Globs too imprecise? Use Python's `re` Regular Expressions
==========================================================

Python's `re` regular expressions PythonRe_ may be used as follows:

.. _PythonRe: https://docs.python.org/3/library/re.html

- The recognition of such regular expressions in strings is disabled by default, but may be easily
enabled ( Set up this way for backwards compatibility in the cases where a path
expression component would start with '{' and end in '}').
- Irrespective of this setting, the user can use `re` regular expressions in the list form of
paths (see below).

.. code-block:: python

>>> import dpath
>>> # enable
>>> dpath.options.ALLOW_REGEX = True
>>> # disable
>>> dpath.options.ALLOW_REGEX = False

- Now a path component may also be specified :

- in a path expression, as {<re.regexpr>} where `<re.regexpr>` is a regular expression
accepted by the standard Python module `re`. For example:

.. code-block:: python

>>> selPath = 'Config/{(Env|Cmd)}'
>>> x = dpath.search(js.lod, selPath)

.. code-block:: python

>>> selPath = '{(Config|Graph)}/{(Env|Cmd|Data)}'
>>> x = dpath.search(js.lod, selPath)

- When using the list form for a path, a list element can also
be expressed as

- a string as above
- the output of :: `re.compile( args )``

An example:

.. code-block:: python

>>> selPath = [ re.compile('(Config|Graph)') , re.compile('(Env|Cmd|Data)') ]
>>> x = dpath.search(js.lod, selPath)

More examples from a realistic json context:

+-----------------------------------------+--------------------------------------+
+ **Extended path glob** | **Designates** +
+-----------------------------------------+--------------------------------------+
+ "\*\*/{[^A-Za-z]{2}$}" | "Id" +
+-----------------------------------------+--------------------------------------+
+ r"\*/{[A-Z][A-Za-z\\d]*$}" | "Name","Id","Created", "Scope",... +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\d$}" | EnableIPv6" +
+-----------------------------------------+--------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "Containers/199c5/MacAddress" +
+-----------------------------------------+--------------------------------------+

With Python's character string conventions, required backslashes in the `re` syntax
can be entered either in raw strings or using double backslashes, thus
the following are equivalent:

+-----------------------------------------+----------------------------------------+
+ *with raw strings* | *equivalent* with double backslash +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*\\d$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*\\\\d$}" +
+-----------------------------------------+----------------------------------------+
+ r"\*\*/{[A-Z][A-Za-z\\d]*Address$}" | "\*\*/{[A-Z][A-Za-z\\\\d]*Address$}" +
+-----------------------------------------+----------------------------------------+

.. _generalized_string_match:

Need still more customization ? Roll your own match method!
===========================================================

We provide the following abstract types, where `StringMatcher` is allowed in Glob in the
sequence form (definitions in `dpath.types`) :

- `StringMatcher` (descriptive Union type ),

- `Duck_StringMatcher`: which will accept a class as a **subtype**, provided it offers a `match` method. Instances may then be used as components in the list form of paths. This method of structural subtyping is explained in PEP 544 [https://peps.python.org/pep-0544/].


- `Basic_StringMatcher`: an abstract base class, enabling your derived class to be recognized and participate in a match.

**Notes:**
- It is required that the `match` method: `match(self, str) -> Optional[object]`,
returns `None` to reject the match.
- Using `Duck_StringMatcher` requires a version of Python and Pypy not less than 3.8,
otherwise you should derive from base class `Basic_StringMatcher`. The
variable `dpath.options.PEP544_PROTOCOL_AVAILABLE` indicates when duck typing is possible.

Then it is up to you... Examples are provided in `tests/test_duck_typing.py`,
including:

- *match anagrams*:

.. code-block:: python

class Anagram():
def __init__(self, s):
self.ref = "".join(sorted(s))

def match(self, st):
retval = True if "".join(sorted(st)) == self.ref else None
return retval

mydict = TestBasics.mydict

r1 = dpath.search(mydict, "**/label")
r2 = dpath.search(mydict, [ '**', Anagram("bella")])

assert r1 == r2

- and *approximate match* (requires `rapidfuzz` https://maxbachmann.github.io/RapidFuzz/):

.. code-block:: python

class Approx():
def __init__(self, s, quality=90):
self.ref = s
self.quality=quality

def match(self, st):
fratio = rapidfuzz.fuzz.ratio(st, self.ref)
retval = True if fratio > self.quality else None
return retval

mydict = TestBasics.mydict


r1 = dpath.search(mydict, "**/placeholder")
r2 = dpath.search(mydict, [ '**', Approx("placecolder")])
r3 = dpath.search(mydict, [ '**', Approx("acecolder",75)])
assert r1 == r2
assert r1 == r3

For comparison, we show now the first example reimplemented to avoid duck typing:

.. code-block:: python

if not dpath.options.PEP544_PROTOCOL_AVAILABLE:
class Anagram(dpath.types.Basic_StringMatcher):
def __init__(self, s):
self.ref = "".join(sorted(s))

def match(self, st):
retval = True if "".join(sorted(st)) == self.ref else None
return retval

dpath.search(mydict, ['**', Anagram("bella")])

dpath.segments : The Low-Level Backend
======================================

Expand Down
34 changes: 29 additions & 5 deletions dpath/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,31 +20,55 @@
"Creator",
]

import re
from collections.abc import MutableMapping, MutableSequence
from typing import Union, List, Any, Callable, Optional

from dpath import segments, options
from dpath.exceptions import InvalidKeyName, PathNotFound
from dpath.exceptions import InvalidKeyName, PathNotFound, InvalidRegex
from dpath.types import MergeType, PathSegment, Creator, Filter, Glob, Path, Hints

_DEFAULT_SENTINEL = object()


def _split_path(path: Path, separator: Optional[str] = "/") -> Union[List[PathSegment], PathSegment]:
def _split_path(path: Glob, separator: Optional[str] = "/") -> Union[List[PathSegment], PathSegment]:
"""
Given a path and separator, return a tuple of segments. If path is
already a non-leaf thing, return it.
Given a path and separator, return a tuple of segments.

If path is already a non-leaf thing, return it: this covers sequences of strings
and re.Patterns.

Note that a string path with the separator at index[0] will have the
separator stripped off. If you pass a list path, the separator is
ignored, and is assumed to be part of each key glob. It will not be
stripped.
stripped (i.e. a first list element can be an empty string).

If RegEx support is enabled then str segments which are wrapped with curly braces will be handled as regular
expressions. These segments will be compiled using re.compile.
Errors during RegEx compilation will raise an InvalidRegex exception.
"""
if not segments.leaf(path):
split_segments = path
elif isinstance(path, re.Pattern):
# Handle paths which are comprised of a single re.Pattern
split_segments = (path,)
else:
split_segments = path.lstrip(separator).split(separator)

if options.ALLOW_REGEX:
# Handle RegEx segments

def compile_regex_segment(segment: PathSegment):
if isinstance(segment, str) and segment.startswith("{") and segment.endswith("}"):
try:
return re.compile(segment[1:-1])
except re.error as re_err:
raise InvalidRegex(f"Could not compile RegEx in path segment '{segment}' ({re_err})")

return segment

split_segments = list(map(compile_regex_segment, split_segments))

return split_segments


Expand Down
11 changes: 8 additions & 3 deletions dpath/exceptions.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,16 +3,21 @@ class InvalidGlob(Exception):
pass


class InvalidRegex(Exception):
"""Invalid regular expression in path segment."""
pass


class PathNotFound(Exception):
"""One or more elements of the requested path did not exist in the object"""
"""One or more elements of the requested path did not exist in the object."""
pass


class InvalidKeyName(Exception):
"""This key contains the separator character or another invalid character"""
"""This key contains the separator character or another invalid character."""
pass


class FilteredValue(Exception):
"""Unable to return a value, since the filter rejected it"""
"""Unable to return a value, since the filter rejected it."""
pass
Loading