This classification code is implemented using Naive Bayes Classifier. The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups. It was originally collected by Ken Lang, probably for his Newsweeder: Learning to filter netnews paper: Ken Lang, Newsweeder: Learning to fillter netnews, Proceedings of the Twelfth International Conference on Machine Learning, 331-339 (1995).
Though he did not explicitly mention this collection. The 20 newsgroups collection has become a popular data set for experiments in text applications of machine learning techniques, such as text classiffcation and text clustering. The data is organized into 20 different newsgroups, each corresponding to a different topic. Here
The original data set is available at http://qwone.com/~jason/20Newsgroups/.
Required packages for Python 3.7 are numpy, pandas, time and sklearn.metrics.
The code should be placed next to '20Newsgroups' folder. This folder should contain these CSV files: ./20newsgroups/train_data.csv
./20newsgroups/train_label.csv
./20newsgroups/test_data.csv
./20newsgroups/test_label.csv
A short report on the performance comparison of Maximum Likelihood Estimator and Naive Bayes Estimator is attached.