Skip to content

plandes/stopword-annotator

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

26 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Stanford CoreNLP Stopword Plugin

This is an extension to the Stanford CoreNLP analytics pipeline to check if a token's word and lemma value are stopwords.

Obtaining

See dependencies

Fork

This is originally John Conwell's coreNlp extensions library. I've updated the dependencies in the POM and only kept the stopword plugin and changed the maven coordinate so I can deploy it to Maven Central (under my account).

Identifying Stopwords in CoreNlp

By default, the StopwordAnnotator uses the built in Lucene stopword list, but you have to option to pass in a custom list of stopwords for it to use instead. You can also specify if the StopwordAnnotator should check the lemma of the token against the stopword list or not.

For examples of how to use the StopwordAnnotator, take a look at StopwordAnnotatorTest.java

Documentation

More documentation:

Changelog

An extensive changelog is available here.

Authors

John Conwell (original author) Paul Landes (maintainer)

License

Copyright © 2016 - 2017 Paul Landes

Apache License Version 2.0

Packages

No packages published

Languages

  • Java 100.0%