A robust and easy-to-use toolkit for POS (Part of Speech; NLP) tagging. It's approach is to automatically construct tagging rules in the form of a binary tree.
Supports pre-trained UPOS, XPOS tagging models for about 80 languages. See folder Models
for more details.
Used in the Cognitive Service Platform cmd.csp.
There are no prerequisites or dependencies others than java core
To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):
<repository>
<id>github</id>
<name>GitHub swelcker Apache Maven Packages</name>
<url>https://maven.pkg.github.com/swelcker</url>
</repository>
<dependency>
<groupId>cmd.csp</groupId>
<artifactId>csppostagger</artifactId>
<version>1.0.0</version>
</dependency>
Then, import cmd.csp.postagger.*;` in your application :
// Example
import csppostagger.*;
private CSPPOSTagger posTagger = new CSPPOSTagger();
private HashMap<String, String> FREQDICT=null;
// init tree from rules file
posTagger.constructTreeFromLanguage(senLanguage);
// init FREQDICT
FREQDICT = utl.getDictionaryByLanguage(senLanguage);
...
... = posTagger.tagSentence(FREQDICT, 'your string or sentence");
``` or
wordtags = CSPPOSInitialTagger.InitTagger4Sentence(FREQDICT, sen);
int size = wordtags.size();
wt = new String[size];
for (int ti = 0; ti < size; ti++) {
tokenizer.BagOfTags.put(wordtags.get(ti).word, tokenizer.BagOfTags.getInteger(wordtags.get(ti).word, 0)+1);
CSPPOSFWObject object = Utils.getObject(wordtags, size, ti);
CSPPOSNode firedNode = posTagger.findFiredNode(object);
maptags.put(wordtags.get(ti).word, firedNode.conclusion);
tokenizer.WordTagList.put(wordtags.get(ti).word, firedNode.conclusion);
wt[ti]=wordtags.get(ti).word;
}
- Maven - Dependency Management
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Stefan Welcker - Modifications based on RDRPOSTagger
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
Find more information about RDRPOSTagger at: http://rdrpostagger.sourceforge.net/
The general architecture and results of the original RDRPOSTagger can be found in the following papers:
-
Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham and Son Bao Pham. RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger. In Proceedings of the Demonstrations at the 14th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2014, pp. 17-20, 2014. [.PDF] [.bib]
-
Dat Quoc Nguyen, Dai Quoc Nguyen, Dang Duc Pham and Son Bao Pham. A Robust Transformation-Based Learning Approach Using Ripple Down Rules for Part-Of-Speech Tagging. AI Communications (AICom), vol. 29, no. 3, pp. 409-422, 2016. [.PDF] [.bib]