Skip to content

A PHP implementation of the English (Porter 2) Stemmer

License

Notifications You must be signed in to change notification settings

drunken-monkey/porter2

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Porter 2 Stemmer for PHP

A stemmer takes a given word and follows a set of rules to reduce this word to search-index-usable stem (as opposed to the actual word root). For example, aggravate, aggravated, and aggravates all reduce to "aggrav," thus creating a commonality between those words.

Martin Porter's English (Porter 2) Algorithm improves on the original Porter stemmer as described here.

Usage

After including the porter2 class in your code execution (e.g., autoloading, require_once, or a framework-specific call like Drupal's module_load_include()), stem a word (string) as follows:

$word = 'aggravated';
$porter2 = new porter2($word);
echo $porter2->stem(); // will print 'aggrav'

Custom exclusions

The default algorithm may not stem certain words to your liking. For example, texas reduces to texa, but texan does not. By passing a custom array of exclusions into the function, you can override the algorithm as needed:

$word = 'texan';
$porter2 = new porter2($word);
$stem->custom_exclusions = array('texan' => 'texa');
echo $porter2->stem(); // will print 'texa'

Stemmer Resources

Tests

A verification list of 29,000 words and their expected stems can be run at the index.php file included. For targeting individual words, use tests.php.

About

A PHP implementation of the English (Porter 2) Stemmer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages