Skip to content

Latest commit

 

History

History
92 lines (81 loc) · 4.87 KB

File metadata and controls

92 lines (81 loc) · 4.87 KB

Comparison of Machine Learning Prediction Models

Compared performance of different ML algorithms in both classification and regression tasks using scikit-learn framewok. The classification performance was evaluated by area under ROC and PR curves, the regression by MSE and R2 scores.

Classification

Dataset

  1. Diabetic Retinopathy
  2. Default of credit card clients
  3. Breast Cancer Wisconsin
  4. Statlog (Australian credit approval)
  5. Statlog (German credit data)
  6. Steel Plates Faults
  7. Adult
  8. Yeast
  9. Thoracic Surgery Data
  10. Seismic-Bumps

Classifier

  1. k-nearest neighbours classification
  2. Support vector classification
  3. Decision tree classification
  4. Random forest classification
  5. AdaBoost classification
  6. Logistic regression (for classification)
  7. Gaussian naive Bayes classification
  8. Neural network classification

Result

Regression

Dataset

  1. Wine Quality
  2. Communities and Crime
  3. QSAR aquatic toxicity
  4. Parkinson Speech
  5. Facebook metrics
  6. Bike Sharing
  7. Student Performance
  8. Concrete Compressive Strength
  9. SGEMM GPU kernel performance
  10. Merck Molecular Activity Challenge (from Kaggle)

Regressor

  1. Support vector regression
  2. Decision tree regression
  3. Random forest regression
  4. AdaBoost regression
  5. Gaussian process regression
  6. Linear regression
  7. Neural network regression

Result

Requirement

  • Install Anaconda
  • Create a conda env that contain python 3.7.5: conda create -n your_env_name python=3.7.5
  • Activate the environment (do this every time you open a new terminal): conda activate your_env_name
  • Install the requirements into this conda env: pip install --user --requirement requirements.txt
  • Run the jupyter notebook: jupyter notebook

Reference