Skip to content

Commit

Permalink
updates changelog and documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
jayantj committed Aug 3, 2016
1 parent bd95a7d commit 2aa5065
Show file tree
Hide file tree
Showing 2 changed files with 10 additions and 7 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ Changes
* Implemented LsiModel.docs_processed attribute
* Added LdaMallet support. Added LdaVowpalWabbit, LdaMallet example to notebook. Added test suite for coherencemodel and aggregation.
Added `topics` parameter to coherencemodel. Can now provide tokenized topics to calculate coherence value (@dsquareindia, #750)
* Changed `use_lowercase` option in word2vec accuracy to `case_insensitive` to account for case variations in training vocabulary

0.13.1, 2016-06-22

Expand Down
16 changes: 9 additions & 7 deletions gensim/models/word2vec.py
Original file line number Diff line number Diff line change
Expand Up @@ -1573,13 +1573,15 @@ def accuracy(self, questions, restrict_vocab=30000, most_similar=most_similar, c
The accuracy is reported (=printed to log and returned as a list) for each
section separately, plus there's one aggregate summary at the end.
`restrict_vocab` is an optional integer which limits the vocab to be used
for answering questions. For example, restrict_vocab=10000 would only check
the first 10000 word vectors in the vocabulary order. (This may be meaningful
if you've sorted the vocabulary by descending frequency.)
Use `case_insensitive` to convert all words in questions and vocab to their uppercase form before evaluating
the accuracy. Useful in case of case-mismatch between training tokens and question words. (default True).
Use `restrict_vocab` to ignore all questions containing a word not in the first `restrict_vocab`
words (default top 30,000). This may be meaningful if you've sorted the vocabulary by descending
frequency. In case `case_insensitive` is True, the first `restrict_vocab` words are taken first, and then
case normalization is performed.
Use `case_insensitive` to convert all words in questions and vocab to their uppercase form before
evaluating the accuracy (default True). Useful in case of case-mismatch between training tokens
and question words. In case of multiple case variants of a single word, the vector for the first
occurrence (also the most frequent if vocabulary is sorted) is taken.
This method corresponds to the `compute-accuracy` script of the original C word2vec.
Expand Down

0 comments on commit 2aa5065

Please sign in to comment.