KNN-Text-Classification

KNN Text Classification using Apache Spark

Using the 20 NewsGroup dataset and Apache Spark, I built a k-nearest neighbors classifier that classifies text data. This code first computes a TF-IDF (Term Frequency - Inverse Document Frequency) Matrix for the top 20k words of the corpus. The TF-IDF matrix is then used to compute similarity distances between a given query text and each of the documents in the corpus.

It will for instance predict that the string "How many goals did Vancouver score last year?" belongs to the class "/rec.sport.hockey".

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

KNN-Text-Classification

About

Releases

Packages

Languages

iNaDeX/KNN-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

KNN-Text-Classification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages