Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word2vec coherence #1530

Merged
merged 31 commits into from
Sep 18, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
31 commits
Select commit Hold shift + click to select a range
b1aa1d9
#1380: Initial implementation of coherence using word2vec similarity.
Jun 6, 2017
60a2130
#1380: Add the `keyed_vectors` kwarg to the `CoherenceModel` to allow…
Jun 7, 2017
60096e1
#1380: Add tests for `with_std` option for confirmation measures, and…
Jun 7, 2017
042ac8b
#1380: Add a `get_topics` method to all topic models, add test covera…
Jun 14, 2017
98e74b1
#1380: Require topics returned from `get_topics` to be probability di…
Jun 14, 2017
aebf987
#1380: Clean up flake8 warnings.
Jun 15, 2017
a13ba74
#1380: Make `topn` a property so setting it to higher values will unc…
Jun 16, 2017
8690dc3
#1380: Pass through `with_std` argument for all coherence measures.
Jun 16, 2017
a1f9127
#1380: Initial implementation of coherence using word2vec similarity.
Jun 6, 2017
345a644
#1380: Add the `keyed_vectors` kwarg to the `CoherenceModel` to allow…
Jun 7, 2017
94fe67b
#1380: Add tests for `with_std` option for confirmation measures, and…
Jun 7, 2017
24686ce
#1380: Add a `get_topics` method to all topic models, add test covera…
Jun 14, 2017
0b0b7ec
#1380: Require topics returned from `get_topics` to be probability di…
Jun 14, 2017
92e5455
#1380: Clean up flake8 warnings.
Jun 15, 2017
f8ecab7
#1380: Make `topn` a property so setting it to higher values will unc…
Jun 16, 2017
59f9fb7
#1380: Pass through `with_std` argument for all coherence measures.
Jun 16, 2017
6e1c76c
Update `test_coherencemodel` to skip Mallet and Vowpal Wabbit tests i…
Aug 13, 2017
0cd16b6
Merge remote-tracking branch 'origin/word2vec_coherence' into word2ve…
Aug 13, 2017
096a6b4
Fix trailing whitespace.
Aug 15, 2017
494a530
Merge branch 'develop' into word2vec_coherence
Aug 27, 2017
3f7926e
Add `get_topics` method to `BaseTopicModel` and update notebook for n…
Aug 28, 2017
d69b5b1
Add several helper methods to the `CoherenceModel` for comparing a se…
Aug 29, 2017
fd78388
fix flake8 whitespace issues
Aug 29, 2017
ad0876a
fix order of imports in `corpora.__init__`
Aug 29, 2017
b678518
fix corpora.__init__ import order
Aug 29, 2017
297711c
push fix for setting `topn` in `CoherenceModel.for_topics`
Aug 29, 2017
6d1d5f4
Use `dict.pop` in place of checking and optionally getting and deleti…
Sep 6, 2017
8ce713e
Merge branch 'develop' into word2vec_coherence
Sep 8, 2017
4c27169
fix non-deterministic test failure in `test_coherencemodel`
Sep 9, 2017
f1113cd
Merge branch 'develop' into word2vec_coherence
Sep 9, 2017
99a1f46
Update coherence model selection notebook to use sklearn dataset load…
Sep 16, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
737 changes: 510 additions & 227 deletions docs/notebooks/topic_coherence_model_selection.ipynb

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion gensim/corpora/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,6 @@
from .dictionary import Dictionary # noqa:F401
from .hashdictionary import HashDictionary # noqa:F401
from .wikicorpus import WikiCorpus # noqa:F401
from .textcorpus import TextCorpus # noqa:F401
from .textcorpus import TextCorpus, TextDirectoryCorpus # noqa:F401
from .ucicorpus import UciCorpus # noqa:F401
from .malletcorpus import MalletCorpus # noqa:F401
8 changes: 8 additions & 0 deletions gensim/models/basemodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,3 +14,11 @@ def print_topics(self, num_topics=20, num_words=10):
probable words for `topics` number of topics to log.
Set `topics=-1` to print all topics."""
return self.show_topics(num_topics=num_topics, num_words=num_words, log=True)

def get_topics(self):
"""
Returns:
np.ndarray: `num_topics` x `vocabulary_size` array of floats which represents
the term topic matrix learned during inference.
"""
raise NotImplementedError
Loading