Skip to content

Solving a Logistic Regression For Multiclass Classification problem to save Hogwarts.

Notifications You must be signed in to change notification settings

EniddeallA/DSLR

Repository files navigation

DSLR

The project is an introduction to Logistic Regression For Multiclass Classification Using One-vs-Rest(One-vs-All) Strategy.

Hogwart's Magic Hat lost its powers, we are here to save the day as data scientists by creating a Logistic Regression Model that classifies Hogwarts Students to the Hogwart House they belong to.

Sumarry

  • Data Analysis

    First of all, take a look at the available data. look in what format it is presented, if there are various types of data, the different ranges, and so on. It is important to make an idea of your raw material before starting. The more you work on data - the more you develop an intuition about how you will be able to use it.

    For that we are reimplemting the pandas.DataFrame.describe function (describe.py), to display information for all numerical features, and explore the data set.

  • Data Visualization

    Data visualization is a powerful tool for a data scientist. It allows you to make insights and develop an intuition of what your data looks like. Visualizing your data also allows you to detect defects or anomalies.

    For that we are using different visualization methods, each answering a particular question.

  • Classification

    Coding the Magic Hat starts now, we are performing a Multi-Classifier using One-Vs-All Logistic Regression Method.

    • Logistic Regression

    Logistic regression’s output lies between 0 and 1 as the algorithm is designed to predict a binary outcome for an event based on the previous observations of a data set. It uses independent variables to predict the occurrence or failure of specific events.

    Logistic Regression

    • One-Vs-All

    For each class, build a logistic regression to find the probability the observation belongs to that class. For each data point, predict the class with the highest probability.

    One-Vs-ALL

    • Mathematics

    Logistic regression works almost like the linear regression. Here is a cost (loss) function:

    $$J(θ) = −1/m \sum_{i=1}^{n} y^i log(hθ(x^i )) + (1 − y^i )log(1 − hθ(x^i ))$$

    Where hθ(x) is defined in the following way :

    $$h_{θ}(x) = g(θ^T x)$$

    With :

    $$g(z) = 1 /{1 + e^{-z}}$$

    The loss function gives us the following partial derivative :

    $$∂/∂θ_{j} J(θ) = 1/m \sum_{i=1}^{m}(h_θ(x^i ) − y^i )x^i_j$$

Usage

Running the logreg_train.py will train the logistic regression using Gradient Descent and outputs predictData.json file containing the weights that will be used for the prediction.

  python logreg_tain.py /path/to/trainingDataset

Adding a second argument --bonus will train the logistic regression using Stochastic Gradient Descent and Batch GD

  python logreg_tain.py /path/to/trainingDataset --bonus

Finally run logreg_predict.py and enjoy a 99% accuracy model.

  python logreg_predict.py /path/to/TestDataset predictData.json

About

Solving a Logistic Regression For Multiclass Classification problem to save Hogwarts.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published