This repository contains the solutions of all the assignments which were given in the course Machine Learning (CSL7550).
- Course Code : CSL7550
- Run : Fall 2021
- Instructor : Dr. Gaurav Harit, Associate Professor, Dept. of CSE, IIT Jodhpur
For all the Assignments, there are seperate JuPyter notebooks. For each problem, there is a seperate .py
file containing the Python code of the same.
Code has been briefly explained in their corresponding Jupyter notebook. Find the assignment which the problem belongs to and refer to the notebook of that assignment.
You need to have the below mentioned libraries/packages to be set up in your environment.
- NumPy
- Matplotlib
- PuLP
- Scikits-learn
- Pandas
- cvxopt
Find the requirements.txt
file and run the below command to install all the above at one go : pip install -r requirements.txt
Choose an appropriate dataset of your choice such that every record (example) has at least 5 features which are numeric in nature and there is at least one attribute (feature) which is binary in nature. You can use the binary attribute as the binary target label to be predicted. In case you want to use a target variable which has more than two distinct values, then you can map them into two sets and give label 1 to one of the sets and 0 to the other. Thus, a multiclass classification task can be reduced to binary classification task.
Split your dataset into a training set and a test set. You can try different splits: 70:30 (70% training, 30% testing), 80:20 or 90:10 split. On the training set, train the following classifiers:
- Half Space classifier implemented using LP solver (one such solver is scipy.optimize.linprog)
- Half Space classifier implemented using Perceptron Algorithm (implement the iterations)
- Logistic Regression Classifier
You can use any other LP solver also. The optimization of the Logistic Regression should be done using gradient descent algorithm.
Select an appropriate dataset, select the independent features (input features) and the dependent feature (target feature), perform dataset split and train a linear regression classifier. Solve for the parameters of the machine to minimize the squared error loss using
- Pseudo-inverse method
- Gradient descent
Use a dataset of your choice and implement the
- Hard SVM learning rule by solving a Quadratic Program using the convex optimization package
cvxopt_solvers.qp
- Soft SVM learning rule using
cvxopt_solvers.qp
For implementing the Hard SVM learning rule, the dataset needs to be made linearly separable by removing some of the training points.
- Repeat (2) but use the stochastic gradient descent algorithm for optimization
For each experiment, identify the training data points which are the support vectors. For the soft SVM formulation conduct experiments with different values of the regularization parameter and interpret the results in terms of number of support vectors, margin value, training data and test data classification accuracy.