-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor API reference gensim.topic_coherence. Fix #1669 #1714
Changes from 1 commit
29a8a37
56eda23
edd53d4
cfd6050
390b01e
8b1a5ca
8d2c584
6eb8335
7a47f05
667cad2
d41c5a3
180c1c1
e3c1e29
da9ca29
f54fb0c
47ee63e
65211f0
c484962
71bb2bf
d9237ea
275edd0
94bde33
782d5cf
81732ef
3c7b401
206784d
406ab5c
67962be
74c5c86
ef058df
0b06468
e3779d4
482377b
acdebb1
8a07dee
63c35c2
4b63f6c
540021c
790e07d
965587b
f8f25cb
f42ad8f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -82,12 +82,12 @@ def aggregate_segment_sims(segment_sims, with_std, with_support): | |
|
||
Parameters | ||
---------- | ||
segment_sims : iterable | ||
floating point similarity values to aggregate. | ||
with_std : bool | ||
Set to True to include standard deviation. | ||
with_support : bool | ||
Set to True to include number of elements in `segment_sims` as a statistic in the results returned. | ||
segment_sims : iterable | ||
floating point similarity values to aggregate. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. No need to describe type in comment ( |
||
with_std : bool | ||
Set to True to include standard deviation. | ||
with_support : bool | ||
Set to True to include number of elements in `segment_sims` as a statistic in the results returned. | ||
|
||
Returns | ||
------- | ||
|
@@ -124,7 +124,7 @@ def log_ratio_measure( | |
segmented_topics : list of (list of tuples) | ||
Output from the segmentation module of the segmented topics. | ||
accumulator: list | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. list of |
||
word occurrence accumulator from probability_estimation. | ||
Word occurrence accumulator from probability_estimation. | ||
with_std : bool | ||
True to also include standard deviation across topic segment | ||
sets in addition to the mean coherence for each topic; default is False. | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,9 +4,10 @@ | |
# Copyright (C) 2013 Radim Rehurek <radimrehurek@seznam.cz> | ||
# Licensed under the GNU LGPL v2.1 - http://www.gnu.org/licenses/lgpl.html | ||
|
||
r""" | ||
This module contains functions to compute confirmation on a pair of words or word subsets. | ||
r"""This module contains functions to compute confirmation on a pair of words or word subsets. | ||
|
||
Notes | ||
----- | ||
The advantage of indirect confirmation measure is that it computes similarity of words in W' and | ||
W* with respect to direct confirmations to all words. Eg. Suppose x and z are both competing | ||
brands of cars, which semantically support each other. However, both brands are seldom mentioned | ||
|
@@ -25,6 +26,7 @@ | |
\Bigg \{{\sum_{w_{i} \in W'}^{ } m(w_{i}, w_{j})^{\gamma}}\Bigg \}_{j = 1,...,|W|} | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Use |
||
|
||
Here 'm' is the direct confirmation measure used. | ||
|
||
""" | ||
|
||
import itertools | ||
|
@@ -126,24 +128,45 @@ def cosine_similarity( | |
\vec{V}^{\,}_{m,\gamma}(W') = | ||
\Bigg \{{\sum_{w_{i} \in W'}^{ } m(w_{i}, w_{j})^{\gamma}}\Bigg \}_{j = 1,...,|W|} | ||
|
||
Args: | ||
segmented_topics: Output from the segmentation module of the | ||
segmented topics. Is a list of list of tuples. | ||
accumulator: Output from the probability_estimation module. Is an | ||
accumulator of word occurrences (see text_analysis module). | ||
topics: Topics obtained from the trained topic model. | ||
measure (str): Direct confirmation measure to be used. Supported | ||
values are "nlr" (normalized log ratio). | ||
gamma: Gamma value for computing W', W* vectors; default is 1. | ||
with_std (bool): True to also include standard deviation across topic | ||
segment sets in addition to the mean coherence for each topic; | ||
default is False. | ||
with_support (bool): True to also include support across topic segments. | ||
The support is defined as the number of pairwise similarity | ||
comparisons were used to compute the overall topic coherence. | ||
|
||
Returns: | ||
list: of indirect cosine similarity measure for each topic. | ||
Parameters | ||
---------- | ||
segmented_topics: list of (list of tuples) | ||
Output from the segmentation module of the segmented topics. | ||
accumulator: accumulator of word occurrences (see text_analysis module). | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. it isn't a type, you can add a link to concrete class like + links to module
|
||
Output from the probability_estimation module. Is an topics: Topics obtained from the trained topic model. | ||
measure : str | ||
Direct confirmation measure to be used. Supported values are "nlr" (normalized log ratio). | ||
gamma: | ||
Gamma value for computing W', W* vectors; default is 1. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
with_std : bool | ||
True to also include standard deviation across topic segment sets in addition to the mean coherence | ||
for each topic; default is False. | ||
with_support : bool | ||
True to also include support across topic segments. The support is defined as the number of pairwise similarity | ||
comparisons were used to compute the overall topic coherence. | ||
|
||
Returns | ||
------- | ||
list | ||
List of indirect cosine similarity measure for each topic. | ||
|
||
Examples | ||
-------- | ||
>>> from gensim.corpora.dictionary import Dictionary | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So big example, need comments like "what happens here" |
||
>>> from gensim.topic_coherence import indirect_confirmation_measure,text_analysis | ||
>>> import numpy as np | ||
>>> dictionary = Dictionary() | ||
>>> dictionary.id2token = {1: 'fake', 2: 'tokens'} | ||
>>> accumulator = text_analysis.InvertedIndexAccumulator({1, 2}, dictionary) | ||
>>> accumulator._inverted_index = {0: {2, 3, 4}, 1: {3, 5}} | ||
>>> accumulator._num_docs = 5 | ||
>>> topics = [np.array([1, 2])] | ||
>>> segmentation = [[(1, np.array([1, 2])), (2, np.array([1, 2]))]] | ||
>>> gamma = 1 | ||
>>> measure = 'nlr' | ||
>>> obtained = indirect_confirmation_measure.cosine_similarity(segmentation, accumulator, topics, measure, gamma) | ||
>>> print obtained[0] | ||
0.623018926945 | ||
|
||
""" | ||
context_vectors = ContextVectorComputer(measure, topics, accumulator, gamma) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Iterable of
?
?