-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lda difference #1334
Lda difference #1334
Conversation
gensim/models/ldamodel.py
Outdated
z[topic1][topic2] = distance_func(d1[topic1], d2[topic2]) | ||
|
||
if np.abs(np.max(z)) > 1e-8: | ||
z /= np.max(z) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please comment on when normalisation is important. please make a flag
Need to add a notebook with simple tutorial with usage & it's PR will be complete |
@@ -532,6 +532,10 @@ def jaccard(vec1, vec2): | |||
return 1 - float(len(intersection)) / float(len(union)) | |||
|
|||
|
|||
def jaccard_set(set1, set2): | |||
return 1. - float(len(set1 & set2)) / float(len(set1 | set2)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will throw an exception if both inputs empty -- is that desired?
Missing docstring.
>>> print(annotation) # get array with positive/negative words for each topic pair from `m1` and `m2` | ||
""" | ||
|
||
distances = {"kulback_leibler": kullback_leibler, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hanging indent. @tmylk
if np.abs(np.max(z)) > 1e-8: | ||
z /= np.max(z) | ||
|
||
annotation = [[None for _ in range(t1_size)] for _ in range(t2_size)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can create lists using *
: [None] * t1_size
.
Although I don't see the point of this initialization. Why not just start empty and append, in the loop below? What's with the None
s?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just start empty and append, in the loop below?
Initialization allows writing more readable code (only assignment to the cell in a cycle).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. If that's your worry, isn't creating the 2D matrix as a numpy matrix (2D array) simpler/more readable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Numpy matrix with complex object type of element [str, str]
is not the best choice
|
||
class TestLdaDiff(unittest.TestCase): | ||
def setUp(self): | ||
texts = [['human', 'interface', 'computer'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hanging indent.
Only first case from PR 1243 integrated into LdaModel as 'diff' method with basic tests