Skip to content

Protect your models from out of scope domains. Focus on what matters.

License

Notifications You must be signed in to change notification settings

GlowstickAI/domain-matcher

Repository files navigation

Domain Matcher

Domain Matcher is a library that aims at matching a pre-defined domain to your input data. Input without domain are deemed not important and thus can be safely filtered out.

Domain Matching performs very cheap OoD detection using topic modeling and keyword extraction.

pip install domain-matcher

Usage

from datasets import load_dataset
from domain_matcher.core import DomainMatcher, DMConfig

# Custom version of `clinc-oos` where non-banking classes are assigned to oos.
ds = load_dataset("GlowstickAI/banking-clinc-oos", "plus")
config = DMConfig(text_column='text', label_column='intent', oos_class='oos')
dmatcher = DomainMatcher(config)
# Fit DM on your train data see our blog to see what's happening!
dmatcher.fit(ds['train'])

# Predict: You can predict on a string, List[str] or Dataset
dmatcher.transform("Can you cancel my credit card?")['in_domain']
# >>> True
dmatcher.transform("Can you cancel my reservation at Giorgi's?")['in_domain']
# >>> False

Troubleshooting

For troubleshooting, please see our wiki or submit an issue if you can't find what you're looking for.

Development

  • Install Pyenv
    • curl https://pyenv.run | bash
    • pyenv install 3.9.13 && pyenv global 3.9.13
  • Install Poetry
  • poetry install
  • Add precommits
    • poetry run pre-commit install

Tooling

  • make format: format the code with Ruff
  • make test: run unit tests and mypy.

About

Protect your models from out of scope domains. Focus on what matters.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published