Skip to content

Commit

Permalink
Added rule_list spacy registered function
Browse files Browse the repository at this point in the history
  • Loading branch information
apmoore1 committed Mar 31, 2022
1 parent 37fb15e commit 014f73d
Show file tree
Hide file tree
Showing 6 changed files with 92 additions and 9 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- spaCy registered functions for reading in a `LexiconCollection` or `MWELexiconCollection` from a TSV. These can be found in `pymusas.spacy_api.lexicon_collection`.
- spaCy registered functions for creating `SingleWordRule` and `MWERule`. These can be found in `pymusas.spacy_api.taggers.rules`.
- spaCy registered function for creating `ContextualRuleBasedRanker`. This can be found in `pymusas.spacy_api.rankers`.
- spaCy registered function for creating a `List` of `Rule`s, this can be found here: `pymusas.spacy_api.taggers.rules.rule_list`.
- `LexiconCollection` and `MWELexiconCollection` open the TSV file downloaded through `from_tsv` method by default using `utf-8` encoding.

### Removed

Expand Down
4 changes: 2 additions & 2 deletions docs/docs/api/spacy_api/lexicon_collection.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ spaCy registered functions for reading in a
@spacy.util.registry.readers('pymusas.LexiconCollection.from_tsv')
def lexicon_collection_from_tsv(
tsv_file_path: Union[PathLike, str],
include_pos: bool
include_pos: bool = True
) -> Dict[str, List[str]]
```

Expand All @@ -39,7 +39,7 @@ dictionary object that can be used to create a
3. `pos` (Optional)

All other fields will be ignored.
- __include\_pos__ : `bool` <br/>
- __include\_pos__ : `bool`, optional (default = `True`) <br/>
Whether to include the POS information, if the information is avaliable,
or not. See [`add_lexicon_entry`](#add_lexicon_entry) for more information on this
parameter.
Expand Down
40 changes: 38 additions & 2 deletions docs/docs/api/spacy_api/taggers/rules.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ spaCy registered functions for creating the following tagger rules:
* [`pymusas.taggers.rules.single_word.SingleWordRule`](/pymusas/api/taggers/rules/single_word/#singlewordrule)
* [`pymusas.taggers.rules.mwe.MWERule`](/pymusas/api/taggers/rules/mwe/#mwerule)

And helper functions for the rules.

<a id="pymusas.spacy_api.taggers.rules.single_word_rule"></a>

### single\_word\_rule
Expand All @@ -19,7 +21,7 @@ spaCy registered functions for creating the following tagger rules:
def single_word_rule(
lexicon_collection: Dict[str, List[str]],
lemma_lexicon_collection: Dict[str, List[str]],
pos_mapper: Dict[str, List[str]]
pos_mapper: Optional[Dict[str, List[str]]] = None
) -> SingleWordRule
```

Expand All @@ -42,7 +44,7 @@ details on parameters to this function.
@spacy.util.registry.misc('pymusas.taggers.rules.MWERule.v1')
def mwe_rule(
mwe_lexicon_lookup: Dict[str, List[str]],
pos_mapper: Dict[str, List[str]]
pos_mapper: Optional[Dict[str, List[str]]] = None
) -> MWERule
```

Expand All @@ -57,3 +59,37 @@ parameters to this function.

- [`pymusas.taggers.rules.mwe.MWERule`](/pymusas/api/taggers/rules/mwe/#mwerule) <br/>

<a id="pymusas.spacy_api.taggers.rules.rule_list"></a>

### rule\_list

```python
@spacy.util.registry.misc('pymusas.taggers.rules.rule_list')
def rule_list(*rules: Rule) -> List[Rule]
```

`pymusas.taggers.rules.rule_list` is a registered function under the
`@misc` function register. The function is required when wanting to create
a `List` of rules within a
[config file](https://thinc.ai/docs/usage-config). We
found it not possible to specify a `List` of custom objects within a config
file, but is possible when using
[variable position arguments](https://thinc.ai/docs/usage-config#registries-args),
which this function accepts as input.

This function is most likely to be
used when creating a [`pymusas.spacy_api.taggers.rule_based.RuleBasedTagger`](/pymusas/api/spacy_api/taggers/rule_based/#rulebasedtagger).


<h4 id="rule_list.parameters">Parameters<a className="headerlink" href="#rule_list.parameters" title="Permanent link">&para;</a></h4>


- __rules__ : `Rule` <br/>
The [`pymusas.taggers.rules.rule.Rule`](/pymusas/api/taggers/rules/rule/#rule)s to convert into a `List`
of `Rule`s.

<h4 id="rule_list.returns">Returns<a className="headerlink" href="#rule_list.returns" title="Permanent link">&para;</a></h4>


- `List[Rule]` <br/>

4 changes: 2 additions & 2 deletions pymusas/spacy_api/lexicon_collection.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@

@spacy.util.registry.readers('pymusas.LexiconCollection.from_tsv')
def lexicon_collection_from_tsv(tsv_file_path: Union[PathLike, str],
include_pos: bool
include_pos: bool = True
) -> Dict[str, List[str]]:
'''
`pymusas.LexiconCollection.from_tsv` is a registered function under the
Expand All @@ -33,7 +33,7 @@ def lexicon_collection_from_tsv(tsv_file_path: Union[PathLike, str],
3. `pos` (Optional)
All other fields will be ignored.
include_pos: `bool`
include_pos: `bool`, optional (default = `True`)
Whether to include the POS information, if the information is avaliable,
or not. See :func:`add_lexicon_entry` for more information on this
parameter.
Expand Down
40 changes: 37 additions & 3 deletions pymusas/spacy_api/taggers/rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,23 @@
spaCy registered functions for creating the following tagger rules:
* :class:`pymusas.taggers.rules.single_word.SingleWordRule`
* :class:`pymusas.taggers.rules.mwe.MWERule`
And helper functions for the rules.
'''
from typing import Dict, List
from typing import Dict, List, Optional

import spacy

from pymusas.taggers.rules.mwe import MWERule
from pymusas.taggers.rules.single_word import SingleWordRule
from pymusas.taggers.rules.rule import Rule


@spacy.util.registry.misc('pymusas.taggers.rules.SingleWordRule.v1')
def single_word_rule(lexicon_collection: Dict[str, List[str]],
lemma_lexicon_collection: Dict[str, List[str]],
pos_mapper: Dict[str, List[str]]) -> SingleWordRule:
pos_mapper: Optional[Dict[str, List[str]]] = None
) -> SingleWordRule:
'''
`pymusas.taggers.rules.SingleWordRule.v1` is a registered function under the
`@misc` function register.
Expand All @@ -32,7 +36,8 @@ def single_word_rule(lexicon_collection: Dict[str, List[str]],

@spacy.util.registry.misc('pymusas.taggers.rules.MWERule.v1')
def mwe_rule(mwe_lexicon_lookup: Dict[str, List[str]],
pos_mapper: Dict[str, List[str]]) -> MWERule:
pos_mapper: Optional[Dict[str, List[str]]] = None
) -> MWERule:
'''
`pymusas.taggers.rules.MWERule.v1` is a registered function under the
`@misc` function register.
Expand All @@ -45,3 +50,32 @@ def mwe_rule(mwe_lexicon_lookup: Dict[str, List[str]],
:class:`pymusas.taggers.rules.mwe.MWERule`
'''
return MWERule(mwe_lexicon_lookup, pos_mapper)


@spacy.util.registry.misc('pymusas.taggers.rules.rule_list')
def rule_list(*rules: Rule) -> List[Rule]:
'''
`pymusas.taggers.rules.rule_list` is a registered function under the
`@misc` function register. The function is required when wanting to create
a `List` of rules within a
[config file](https://thinc.ai/docs/usage-config). We
found it not possible to specify a `List` of custom objects within a config
file, but is possible when using
[variable position arguments](https://thinc.ai/docs/usage-config#registries-args),
which this function accepts as input.
This function is most likely to be
used when creating a :class:`pymusas.spacy_api.taggers.rule_based.RuleBasedTagger`.
# Parameters
rules : `Rule`
The :class:`pymusas.taggers.rules.rule.Rule`s to convert into a `List`
of `Rule`s.
# Returns
`List[Rule]`
'''
return list(rules)
11 changes: 11 additions & 0 deletions tests/spacy_api/taggers/test_spacy_api_rules.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import spacy

from pymusas.spacy_api.taggers import rules # noqa: F401
from pymusas.taggers.rules.rule import Rule
from pymusas.taggers.rules.mwe import MWERule
from pymusas.taggers.rules.single_word import SingleWordRule

Expand All @@ -18,3 +19,13 @@ def test_mwe_rule() -> None:
mwe_rule: Callable[[Dict[str, List[str]], Dict[str, List[str]]], MWERule] \
= spacy.util.registry.misc.get('pymusas.taggers.rules.MWERule.v1')
assert isinstance(mwe_rule({}, {}), MWERule)


def test_rule_list() -> None:
rule_list_call: Callable[[Rule, Rule], List[Rule]] \
= spacy.util.registry.misc.get('pymusas.taggers.rules.rule_list')
single_rule = SingleWordRule({}, {})
mwe_rule = MWERule({})
list_of_rules = rule_list_call(*(single_rule, mwe_rule))
assert isinstance(list_of_rules, List)
assert [single_rule, mwe_rule] == list_of_rules

0 comments on commit 014f73d

Please sign in to comment.