Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.38 KB

File metadata and controls

25 lines (18 loc) · 1.38 KB

Topic-Modelling-Using-LDA-and-NMF

Topic Modelling and Recommendation System for News Articles using Non-Negative Matrix Factorization (NMF) and Linear discriminant analysis (LDA).

An article recommendation engine using TF-IDF where by giving a keyword, the engine would suggest the top most documents by using cosine similarity from the pool of documents is also developed.

Topic modeling is a type of statistical modeling for discovering the abstract “topics” that occur in a collection of documents.

Latent Dirichlet Allocation (LDA)

LDA is an example of topic model and is used to classify text in a document to a particular topic. It builds a topic per document model and words per topic model, modeled as Dirichlet distributions.

Non-Negative Matrix Factorization (NMF)

NMF is an unsupervised technique so there are no labeling of topics that the model will be trained on. The way it works is that, NMF decomposes (or factorizes) high-dimensional vectors into a lower-dimensional representation. These lower-dimensional vectors are non-negative which also means their coefficients are non-negative.

Approach

  • Topic Modelling Using LDA.
  • Topic Modelling Using NMF.
  • Cosine Similarity as a means for recommending articles.
  • Given a keyword, Document Recommender system can suggest you the best documents from the pool of documents.

Frameworks

  • Gensim
  • NLTK
  • Scikit-learn
  • Numpy