-
Notifications
You must be signed in to change notification settings - Fork 239
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement/support/explain topic modelling #42
Comments
I'm not sure if topic modeling has already been implemented in TextHero, however if it hasn't you might be interested in leveraging Gensim. Hope this helps! |
Hey Luca, No, topic modeling hasn't been implemented in Texthero (with the small h) yet. Gensim is an alternative but we might not need it either if we implement LSA as this the same as calling And yes, the visualization and understanding of the models are for sure an important aspect but that's not the core of the issue. The core of the issue is to understand how to correctly implement topic modeling, which algorithm to pick, see if Gensim is strictly necessary, the function signature and output, and so on. |
@jbesomi For LSA or LDA I think Scikit Learn is a good option https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.TruncatedSVD.html, https://scikit-learn.org/0.16/modules/generated/sklearn.lda.LDA.html (because you already use it for vectorization, dimension reduction and clustering operations already). |
Thank you Julia! Soon, @henrifroese and @mk2510 will work on this. |
Goal
Implement topic modeling on Texthero.
Topic modeling
There are mainly two ways to do topic modeling: LSA/LSI (latent semantic indexing) and LDA (Latent Dirichlet allocation). This simple tutorial explains how to implement it in python.
Python implementation
LSA/LSI is just basically TF-IDF + SVD. What's it's important is to understand how to visualize and how to return the topic model information from the function.
Documentation
Other than adding the docstring, it's probably useful to write a "getting started" tutorial on how topic modeling works and how to use Texthero's function.
We will probably want to implement both LSI and LDA, in two? separate functions.
This issue is a work in progress. Any help is very appreciated!
The text was updated successfully, but these errors were encountered: