-
Notifications
You must be signed in to change notification settings - Fork 17
Key words
The number of times that a word or term occurs in a document
- Chop off the ends of the words
- Reduce inflectional forms of words
- Decrease the size of the vocabulary
"automation, automatic, automates"-->automat Porter's algorithm: ssess -->ss ies-->i ational-->ate tional-->tion
Recall for queries
Precision harm
Transform to standard form according to syntactic category
verb + ing --> verb noun + s --> noun am, are, is --> be car, cars, car's, cars' --> car The boy’s cars are different colors --> lemmatization --> the boy car be different color
Common words which would appear to be of little value in helping select documents that are excluded from the index vocabulary.
They are function words without much information such as propositions, articles, pronouns, adverbs, adjectives, frequent words (of, in, about, which, although, and so on). They are not added to the index.
For example: '.'