Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[MRG] API for Static Topic for word in Vocabulary. #706

Merged
merged 11 commits into from
May 31, 2016
22 changes: 22 additions & 0 deletions gensim/models/ldamodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -905,6 +905,28 @@ def get_document_topics(self, bow, minimum_probability=None):
return [(topicid, topicvalue) for topicid, topicvalue in enumerate(topic_dist)
if topicvalue >= minimum_probability]

def get_static_topic(self, word_id, minimum_probability=None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default minimum_probability=0 is most intuitive

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I disagree. 0.0 is a specific value to use, whereas None stands for "use default". Different semantics.

"""
Returns static topics for word in vocabulary.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs better docs -- what is a "static topic"?


"""
if minimum_probability is None:
minimum_probability = self.minimum_probability
minimum_probability = max(minimum_probability, 1e-8) # never allow zero values in sparse output

# if user enters word instead of id in vocab, change to get id
if isinstance(word_id, str):
word_id = self.id2word.doc2bow([word_id])[0][0]

values = []
for topic_id in range(0, self.num_topics):
if self.expElogbeta[topic_id][word_id] >= minimum_probability:
values.append((topic_id, self.expElogbeta[topic_id][word_id]))

return values



def __getitem__(self, bow, eps=None):
"""
Return topic distribution for the given document `bow`, as a list of
Expand Down
18 changes: 18 additions & 0 deletions gensim/test/test_ldamodel.py
Original file line number Diff line number Diff line change
Expand Up @@ -265,6 +265,24 @@ def testGetDocumentTopics(self):
self.assertTrue(isinstance(k, int))
self.assertTrue(isinstance(v, float))

def testStaticTopics(self):

numpy.random.seed(0)
model = self.class_(self.corpus, id2word=dictionary, num_topics=2, passes= 100)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8: no spaces around argument params.


# check with id
result = model.get_static_topic(2)
expected = [(1, 0.1066)]
self.assertEqual(result[0][0], expected[0][0])
self.assertAlmostEqual(result[0][1], expected[0][1], places=2)

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PEP8 (too many blank lines).


# if user has entered word instead, check with word
result = model.get_static_topic(str(model.id2word[2]))
expected = [(1, 0.1066)]
self.assertEqual(result[0][0], expected[0][0])
self.assertAlmostEqual(result[0][1], expected[0][1], places=2)


def testPasses(self):
# long message includes the original error message with a custom one
Expand Down