Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lda difference #1334

Merged
merged 5 commits into from
May 18, 2017
Merged

Lda difference #1334

merged 5 commits into from
May 18, 2017

Conversation

menshikh-iv
Copy link
Contributor

@menshikh-iv menshikh-iv commented May 18, 2017

Only first case from PR 1243 integrated into LdaModel as 'diff' method with basic tests

z[topic1][topic2] = distance_func(d1[topic1], d2[topic2])

if np.abs(np.max(z)) > 1e-8:
z /= np.max(z)
Copy link
Contributor

@tmylk tmylk May 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment on when normalisation is important. please make a flag

@menshikh-iv
Copy link
Contributor Author

Need to add a notebook with simple tutorial with usage & it's PR will be complete

@tmylk tmylk merged commit df1663f into piskvorky:develop May 18, 2017
@@ -532,6 +532,10 @@ def jaccard(vec1, vec2):
return 1 - float(len(intersection)) / float(len(union))


def jaccard_set(set1, set2):
return 1. - float(len(set1 & set2)) / float(len(set1 | set2))
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will throw an exception if both inputs empty -- is that desired?

Missing docstring.

>>> print(annotation) # get array with positive/negative words for each topic pair from `m1` and `m2`
"""

distances = {"kulback_leibler": kullback_leibler,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hanging indent. @tmylk

if np.abs(np.max(z)) > 1e-8:
z /= np.max(z)

annotation = [[None for _ in range(t1_size)] for _ in range(t2_size)]
Copy link
Owner

@piskvorky piskvorky May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can create lists using *: [None] * t1_size.

Although I don't see the point of this initialization. Why not just start empty and append, in the loop below? What's with the Nones?

Copy link
Contributor Author

@menshikh-iv menshikh-iv May 27, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just start empty and append, in the loop below?

Initialization allows writing more readable code (only assignment to the cell in a cycle).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. If that's your worry, isn't creating the 2D matrix as a numpy matrix (2D array) simpler/more readable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy matrix with complex object type of element [str, str] is not the best choice


class TestLdaDiff(unittest.TestCase):
def setUp(self):
texts = [['human', 'interface', 'computer'],
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hanging indent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants