Lda difference #1334

menshikh-iv · 2017-05-18T10:51:31Z

Only first case from PR 1243 integrated into LdaModel as 'diff' method with basic tests

tmylk · 2017-05-18T11:11:08Z

gensim/models/ldamodel.py

+                z[topic1][topic2] = distance_func(d1[topic1], d2[topic2])
+
+        if np.abs(np.max(z)) > 1e-8:
+            z /= np.max(z)


Please comment on when normalisation is important. please make a flag

menshikh-iv · 2017-05-18T11:37:10Z

Need to add a notebook with simple tutorial with usage & it's PR will be complete

piskvorky · 2017-05-27T17:12:02Z

gensim/matutils.py

@@ -532,6 +532,10 @@ def jaccard(vec1, vec2):
        return 1 - float(len(intersection)) / float(len(union))


+def jaccard_set(set1, set2):
+    return 1. - float(len(set1 & set2)) / float(len(set1 | set2))


Will throw an exception if both inputs empty -- is that desired?

Missing docstring.

piskvorky · 2017-05-27T17:12:49Z

gensim/models/ldamodel.py

+        >>> print(annotation) # get array with positive/negative words for each topic pair from `m1` and `m2`
+        """
+
+        distances = {"kulback_leibler": kullback_leibler,


Hanging indent. @tmylk

piskvorky · 2017-05-27T17:14:14Z

gensim/models/ldamodel.py

+            if np.abs(np.max(z)) > 1e-8:
+                z /= np.max(z)
+
+        annotation = [[None for _ in range(t1_size)] for _ in range(t2_size)]


You can create lists using *: [None] * t1_size.

Although I don't see the point of this initialization. Why not just start empty and append, in the loop below? What's with the Nones?

Why not just start empty and append, in the loop below?

Initialization allows writing more readable code (only assignment to the cell in a cycle).

I see. If that's your worry, isn't creating the 2D matrix as a numpy matrix (2D array) simpler/more readable?

Numpy matrix with complex object type of element [str, str] is not the best choice

piskvorky · 2017-05-27T17:15:45Z

gensim/test/test_tmdiff.py

+
+class TestLdaDiff(unittest.TestCase):
+    def setUp(self):
+        texts = [['human', 'interface', 'computer'],


Hanging indent.

menshikh-iv added 4 commits May 18, 2017 15:10

Add jaccard distance for sets

54fc040

Add diff method for LDA

b000a01

Add basic tests for diff method

8c03728

rm unused imports & add shebang with info

5270440

tmylk reviewed May 18, 2017

View reviewed changes

upd

5077148

tmylk merged commit df1663f into piskvorky:develop May 18, 2017

piskvorky reviewed May 27, 2017

View reviewed changes

menshikh-iv mentioned this pull request Jun 22, 2017

[WIP][DNM] Visualize topic model difference (need feedback) #1243

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lda difference #1334

Lda difference #1334

menshikh-iv commented May 18, 2017 •

edited

Loading

tmylk May 18, 2017 •

edited

Loading

menshikh-iv commented May 18, 2017

piskvorky May 27, 2017

piskvorky May 27, 2017

piskvorky May 27, 2017 •

edited

Loading

menshikh-iv May 27, 2017 •

edited

Loading

piskvorky May 28, 2017

menshikh-iv May 28, 2017

piskvorky May 27, 2017

Lda difference #1334

Lda difference #1334

Conversation

menshikh-iv commented May 18, 2017 • edited Loading

tmylk May 18, 2017 • edited Loading

Choose a reason for hiding this comment

menshikh-iv commented May 18, 2017

piskvorky May 27, 2017

Choose a reason for hiding this comment

piskvorky May 27, 2017

Choose a reason for hiding this comment

piskvorky May 27, 2017 • edited Loading

Choose a reason for hiding this comment

menshikh-iv May 27, 2017 • edited Loading

Choose a reason for hiding this comment

piskvorky May 28, 2017

Choose a reason for hiding this comment

menshikh-iv May 28, 2017

Choose a reason for hiding this comment

piskvorky May 27, 2017

Choose a reason for hiding this comment

menshikh-iv commented May 18, 2017 •

edited

Loading

tmylk May 18, 2017 •

edited

Loading

piskvorky May 27, 2017 •

edited

Loading

menshikh-iv May 27, 2017 •

edited

Loading