Skip to content

A no code machine learning pipelines and data visualization platform | perform with learning

License

Notifications You must be signed in to change notification settings

IMsumitkumar/No-code-ML-platform-DashB.ai

Repository files navigation

GitHub issues GitHub forks GitHub stars GitHub license GitHub repo size


DashB.ai

Video Demo

Table of Contents

About The Project

main page

Overview

  • This is a web app that automates the data preprocessing pipeline.Target is to automate the whole machine learning pipeline.But this project is final till data preprocessing pipeline.
  • Currently this project is in developement phase.
  • User can upload comma seperated value files or directly fetch the data from mysql database.(Make sure mysql is installed in your system).
  • User's have all the command what to perform and what to not so selected operations can be passed to the pipeline to showcase the result.
  • User's can visualize the data using dataviz tool comes along with Dash.ai which can visualize the data without writing any code. (Made by Dash by plotly)

Built With

sumit sumit sumit sumit sumit sumit sumit sumit

  • Bootstrap
  • scikit learn
  • plotly


Getting Started

To get a local copy up and running follow these simple steps. make sure git is installed in yout machine.

Installation

  1. Clone the repo
git clone https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai
  1. create a virtual env and activate
conda create -n <env_name> python=3.7
conda activate <env_name>
  1. Install dependencies
pip install -r requirements.txt      -      (inside project directory)

RUN

STEP 1 : Migrate the databse tables and create superuser

python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser

    username : *****
    email    : *****
    password : ******

STEP 2

python manage.py runserver

STEP 3 : OPTIONAL For email recovery you have to set our credentials in DashB -> settings.py

Set your email and password

Preprocessing Pipeline Tree

├── Handle Datatypes
│   ├── Drop unnecessary features.
│   ├── replace inf with NaN.
│   ├── Make sure all the column names are of string type and clean them.
│   ├── Remove the column if target column has NaN.
│   ├── Remove Duplicate columns
│   ├── handle numerical, catergorical and time features.
│   └── Try to determine Ml usecase and encode.
├── Handle Missing Values
│   ├────── Numerical Features
│   ├── Replace with mean.
│   ├── Replace with median.
│   ├── Repalce with Mode.
│   ├── Replace with standard deviation.
│   ├── Replace with zero.
│   ├────── Categorical Features
│   ├── Replace with mean.
│   ├── Replace with "Missing".
│   └── Repalce with Most frequent value.
├── Removing zero and near zero variance columns
│   ├── Eliminate the features that have zero varinace,
│   └── Eliminate the features that have near zero variace.
├── Group Similiar Features
│   └── Group more than two features Make new features with them.
├── Normalization and Transformation
│   ├────── Operations to apply only on numerical features
│   ├── ZScore
│   ├── MinMax
│   ├── Quantile
│   ├── MaxAbs
│   ├── Yeo-Johnson
│   ├────── Target t7ransformation (regression)
│   ├── Box-Cox
│   └── Yeo-Johnson
├── Making Time Features
│   ├── Take a time feature and extract more features from it
│   └── (Day, Month, Year, Hour, Minute, Second, Quantile, Quarter, Day of week, week day name, day of year, week of year )
├── Feature Encoding
│   ├────── Ordinal Encoding
│   ├── LabelEncoding
│   ├── Target Guided ordinal encoding
│   ├────── One hot encoding
│   ├── KDD orange
│   ├── Mean Encoding
│   └── Counter/frequency encoding
├── Removing Outliers
│   ├── Isolaton Forest
│   ├── KNN
│   ├── PCA
│   └── Elliptical envelope
├── Feature Selection
│   ├── Chi squared (Not working perfectly)
│   ├── RFE (Not working on all the data)
│   ├── Lasso (works perfectly)
│   ├── Random Forest
│   ├── lgbm (works perfectly)
│   └── Remove zero variance features
├── Imbalance Dataset (Not done yet)
│   ├── Ensemble techniques automatically handles imblance dataset
│   ├── Undersampling (Not a good idea)
│   ├── Oversampling 
│   ├── SMOTE
│   └── Isolation Forest
└──NExt Step

Directory Tree

├── accounts 
│   └─────────── # handles login, signup and password recovery. 
├── DashB
│   └─────────── # main folder contains wsgi, routing, settings and urls.
├── data
│   └─────────── # main folder for performing pipeline.
├── Viz
│   └─────────── # project app for data visualizatio tool.
├── static
│   └─────────── # contains static files.
├── media
│   └─────────── # storage folder of uploaded media.
├── templates
│   └─────────── # contains landing page templates
├── manage.py
├── requirements.txt
├── LICENSE
├── README.md
└── db.sqlite3

Contributing

Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Team

Sumit Kumar
Sumit

License

APM

Copyright 2020 Sumit Kumar

Contact

Sumit Kumar - email me @sksumit068@gmail.com

Project Link: https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai

References

Credits

  • HTML templates are being used from open source.
  • Modificatons are made by me.

About

A no code machine learning pipelines and data visualization platform | perform with learning

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published