punctuators
is a project for inference for punctuation and related analytics.
This project is a mostly-undocumented prototype at the moment.
The links to models below contain sufficient documentation for each model.
This project can be installed with pip
:
$ pip install punctuators
This section lists the models currently supported by this package.
These models perform punctuation restoration, true-casing (capitalization), and sentence boundary detection ( segmentation). These analytics together are referred to as PCS (punctuation, capitalization, segmentation).
The following model card describes a base-sized model that can perform PCS on 47 common languages: https://huggingface.co/1-800-BAD-CODE/punct_cap_seg_47_language
Sentence Boundary Detection (SBD) is the simpler task of accepting punctuated input and segmenting the input into separate sentences.
The following model card describes a small-sized model that can perform SBD on 49 common languages: https://huggingface.co/1-800-BAD-CODE/sentence_boundary_detection_multilang