-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] scikit_learn wrapper for LSI Model in Gensim #1244
[WIP] scikit_learn wrapper for LSI Model in Gensim #1244
Conversation
self.model.fit(corpus) | ||
|
||
def testPrintTopic(self): | ||
topic = self.model.print_topics(2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This test seems redundant. Please justify including it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmylk My rationale for having this test was basically to serve as a sanity check after setUp()
prepares the test fixture. Since such a test was present among the unit tests for the sklearn wrapper written for LDA model as well, I decided to include it here too. Please let me know if we should drop this test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please rename as testModelSanity then
cache = pickle.loads(uncompressed_content) | ||
data = cache | ||
id2word=Dictionary(map(lambda x : x.split(), data.data)) | ||
corpus = [id2word.doc2bow(i.split()) for i in data.data] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The transformation of a text into gensim bow format corpus should have a sklearn-wrapper around so that the pipeline can take in text, not bow. See more about Gensim corpus in this [tutorial(https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/Corpora_and_Vector_Spaces.ipynb) Please either add it in this PR or create a wishlist issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please respond to this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmylk My sincerest apologies for the delayed response. I'd be happy to submit a PR to resolve this issue. I did not add code for this in the current PR because some of the code for the wrapper for LdaModel might also have to be refactored. So maintaining modularity and proposing a solution in a different PR seemed to be a better choice in this case.
""" | ||
|
||
def __init__(self, corpus=None, num_topics=200, id2word=None, chunksize=20000, | ||
decay=1.0, onepass=True, power_iters=P2_EXTRA_ITERS, extra_samples=P2_EXTRA_DIMS): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please make defaults explicits. what is the reasons for using constant variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tmylk Thanks for pointing this out. I'll make the default values explicit here.
Great, please add this example to https://github.com/RaRe-Technologies/gensim/blob/develop/docs/notebooks/sklearn_wrapper.ipynb |
@tmylk Sure. Adding LsiModel wrapper example to the IPython notebook. |
@tmylk For LsiModel, I have added an example which uses |
Thanks for the new wrapper! |
This PR adds a scikit-learn wrapper for LSI Model in Gensim.