This Repository Abalone classification project. This is the first project in my self learning curriculum.
The dataset I have used can be found here (https://www.kaggle.com/rodolfomendes/abalone-dataset)
In the journey of learning machine learning especially classification EDA and classification model evaluation, I have made a self learning curriculum which is comprised of building 10 classification projects end-to-end involving steps, which are as follows:
-
Do the EDA
- Using pandas, numpy, statsmoel, sklearn, seaborn and matplotlib
- Check for bias-sampling such as data imbalance
- descriptive statstics
- Measure of central tendency
- Measure of dispersion
- Measure of association
- Check for skewness and kurtosis
- If needed data imputation
- If needed data transformation
- Outlier detection and handling
-
Choose best Model
- Train model systematically
- Use ensemble models
- Use cross-validation method to reduce variance error
-
Do the model evaluation
- metrics: accuracy, precision, recall, F1-score and ROC-AUC
- use mlxtent to observe bias variance decomposition of error
- AIC and BIC for checking model bias-variance
Using flask library build model-api. Learn the folder structure of api backend which can be scaled later if required.
Learn deplyment to digitalocean and deploying to heroku from github.