Simple implementation of Snowball Stemmer (http://snowballstem.org/) in Java with Stemmers for 20+ languages. Helpful to reduce tokens to their core syntax esp. when processing them in Machine Learning Models (ML). Used in the Cognitive Service Platform cmd.csp as part of the NLP (Natural Language Processing) features.
There are no prerequisites or dependencies others than java core
To use, merge the following into your Maven POM (or the equivalent into your Gradle build script):
<repository>
<id>github</id>
<name>GitHub swelcker Apache Maven Packages</name>
<url>https://maven.pkg.github.com/swelcker</url>
</repository>
<dependency>
<groupId>cmd.csp</groupId>
<artifactId>cspstemmer</artifactId>
<version>1.0.0</version>
</dependency>
Then, import cmd.csp.stemmer.*;` in your application :
// Example
import cspstemmer.*;
private SnowballStemmer stemmer;
private Locale locale = null;
...
if(this.locale==null) {
this.locale = Locale.getDefault();
}
...
switch(locale.getISO3Language().toLowerCase()){
case "ara":stemmer=new ArabicStemmer();break;
case "dan":stemmer=new DanishStemmer();break;
case "nld":stemmer=new DutchStemmer();break;
case "eng":stemmer=new EnglishStemmer();break;
case "fin":stemmer=new FinnishStemmer();break;
case "fra":stemmer=new FrenchStemmer();break;
case "deu":stemmer=new GermanStemmer();break;
case "hun":stemmer=new HungarianStemmer();break;
case "ind":stemmer=new IndonesianStemmer();break;
case "gle":stemmer=new IrishStemmer();break;
case "ita":stemmer=new ItalianStemmer();break;
case "nep":stemmer=new NepaliStemmer();break;
case "nor":stemmer=new NorwegianStemmer();break;
case "por":stemmer=new PortugueseStemmer();break;
case "ron":stemmer=new RomanianStemmer();break;
case "spa":stemmer=new SpanishStemmer();break;
case "rus":stemmer=new RussianStemmer();break;
case "swe":stemmer=new SwedishStemmer();break;
case "tam":stemmer=new TamilStemmer();break;
case "tur":stemmer=new TurkishStemmer();break;
default:stemmer=new NaiveStemmer();break;
}
// Then set the token to be stemmed
String tkn = "Testvariable";
String result = "";
stemmer.setCurrent(tkn);
// call to stemm
stemmer.stem();
// get/use the result
result = stemmer.getCurrent();
...
- Maven - Dependency Management
Please read CONTRIBUTING.md for details on our code of conduct, and the process for submitting pull requests to us.
We use SemVer for versioning. For the versions available, see the tags on this repository.
- Stefan Welcker - Modifications
See also the list of contributors who participated in this project.
This project is licensed under the MIT License - see the LICENSE.md file for details
- Forked and modified from the original with Copyright (c) 2001, Dr Martin Porter, Copyright (c) 2002, Richard Boulton. All rights reserved.