- About the Project
- Getting Started
- preprocessing pipeline tree
- Directory Tree
- Contributing
- Team
- License
- Contact
- References
- Credits
- This is a web app that automates the data preprocessing pipeline.Target is to automate the whole machine learning pipeline.But this project is final till data preprocessing pipeline.
- Currently this project is in developement phase.
- User can upload comma seperated value files or directly fetch the data from mysql database.(Make sure mysql is installed in your system).
- User's have all the command what to perform and what to not so selected operations can be passed to the pipeline to showcase the result.
- User's can visualize the data using dataviz tool comes along with Dash.ai which can visualize the data without writing any code. (Made by Dash by plotly)
- Bootstrap
- scikit learn
- plotly
To get a local copy up and running follow these simple steps. make sure git is installed in yout machine.
- Clone the repo
git clone https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai
- create a virtual env and activate
conda create -n <env_name> python=3.7
conda activate <env_name>
- Install dependencies
pip install -r requirements.txt - (inside project directory)
STEP 1 : Migrate the databse tables and create superuser
python manage.py makemigrations
python manage.py migrate
python manage.py createsuperuser
username : *****
email : *****
password : ******
STEP 2
python manage.py runserver
STEP 3 : OPTIONAL For email recovery you have to set our credentials in DashB -> settings.py
Set your email and password
├── Handle Datatypes
│ ├── Drop unnecessary features.
│ ├── replace inf with NaN.
│ ├── Make sure all the column names are of string type and clean them.
│ ├── Remove the column if target column has NaN.
│ ├── Remove Duplicate columns
│ ├── handle numerical, catergorical and time features.
│ └── Try to determine Ml usecase and encode.
├── Handle Missing Values
│ ├────── Numerical Features
│ ├── Replace with mean.
│ ├── Replace with median.
│ ├── Repalce with Mode.
│ ├── Replace with standard deviation.
│ ├── Replace with zero.
│ ├────── Categorical Features
│ ├── Replace with mean.
│ ├── Replace with "Missing".
│ └── Repalce with Most frequent value.
├── Removing zero and near zero variance columns
│ ├── Eliminate the features that have zero varinace,
│ └── Eliminate the features that have near zero variace.
├── Group Similiar Features
│ └── Group more than two features Make new features with them.
├── Normalization and Transformation
│ ├────── Operations to apply only on numerical features
│ ├── ZScore
│ ├── MinMax
│ ├── Quantile
│ ├── MaxAbs
│ ├── Yeo-Johnson
│ ├────── Target t7ransformation (regression)
│ ├── Box-Cox
│ └── Yeo-Johnson
├── Making Time Features
│ ├── Take a time feature and extract more features from it
│ └── (Day, Month, Year, Hour, Minute, Second, Quantile, Quarter, Day of week, week day name, day of year, week of year )
├── Feature Encoding
│ ├────── Ordinal Encoding
│ ├── LabelEncoding
│ ├── Target Guided ordinal encoding
│ ├────── One hot encoding
│ ├── KDD orange
│ ├── Mean Encoding
│ └── Counter/frequency encoding
├── Removing Outliers
│ ├── Isolaton Forest
│ ├── KNN
│ ├── PCA
│ └── Elliptical envelope
├── Feature Selection
│ ├── Chi squared (Not working perfectly)
│ ├── RFE (Not working on all the data)
│ ├── Lasso (works perfectly)
│ ├── Random Forest
│ ├── lgbm (works perfectly)
│ └── Remove zero variance features
├── Imbalance Dataset (Not done yet)
│ ├── Ensemble techniques automatically handles imblance dataset
│ ├── Undersampling (Not a good idea)
│ ├── Oversampling
│ ├── SMOTE
│ └── Isolation Forest
└──NExt Step
├── accounts
│ └─────────── # handles login, signup and password recovery.
├── DashB
│ └─────────── # main folder contains wsgi, routing, settings and urls.
├── data
│ └─────────── # main folder for performing pipeline.
├── Viz
│ └─────────── # project app for data visualizatio tool.
├── static
│ └─────────── # contains static files.
├── media
│ └─────────── # storage folder of uploaded media.
├── templates
│ └─────────── # contains landing page templates
├── manage.py
├── requirements.txt
├── LICENSE
├── README.md
└── db.sqlite3
Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Sumit |
Copyright 2020 Sumit Kumar
Sumit Kumar - email me @sksumit068@gmail.com
Project Link: https://github.com/IMsumitkumar/No-code-ML-platform-DashB.ai
- https://docs.djangoproject.com/en/3.1/
- https://www.djangoproject.com/
- https://www.youtube.com/channel/UCTZRcDjjkVajGL6wd76UnGg
- https://plotly.com/
- https://pycaret.org/
- https://scikit-learn.org/
- https://getbootstrap.com/docs/4.0/getting-started/introduction/
- https://django-plotly-dash.readthedocs.io/en/latest/
- https://www.kaggle.com/
- https://www.researchgate.net/publication/220320826_Winning_the_KDD_Cup_Orange_Challenge_with_Ensemble_Selection
- HTML templates are being used from open source.
- Modificatons are made by me.