-
Notifications
You must be signed in to change notification settings - Fork 17
Key words
Le Thu Nguyen edited this page Apr 20, 2018
·
4 revisions
The number of times that a word or term occurs in a document
- Chop off the ends of the words
- Reduce inflectional forms of words
- Decrease the size of the vocabulary
"automation, automatic, automates"→automat
Porter's algorithm:
- ssess →ss
- ies→i
- ational→ate
- tional→tion
Recall for queries
Precision harm
Transform to standard form according to syntactic category
- verb + ing → verb
- noun + s → noun
- am, are, is →be
- car, cars, car's, cars' → car
- The boy’s cars are different colors → lemmatization → the boy car be different color
Common words which would appear to be of little value in helping select documents that are excluded from the index vocabulary.
They are function words without much information such as propositions, articles, pronouns, adverbs, adjectives, frequent words (of, in, about, which, although, and so on). They are not added to the index.
For example: '.'