This notebook uses machine learning to predict the quality of red wine and uses machine learning explainability to find the factors that were most important for predicting wine quality. We explore those factors and use those insights to help pick out bottles of wine when shopping. This notebook is hosted on Kaggle and can be found here: https://www.kaggle.com/code/jarredpriester/picking-wine-it-s-all-in-the-data
The main purpose of this project was to practice working on regression models. This model used Extreme Gradient Boosting and a deep neural network. The second purpose was to use data to help with picking bottles of wine.
I learned how machine learning explainability can be used to take a deep dive into machine learning models and extract key insights that can be used in the real world. I also learned that the highest quality of wine had an alchoal content between 12% - 15%.
The data set we will be using is a dataset from the University of California Irvine's Machine Learning Repository called Wine Quality Dataset. The dataset consists of red vinho verde wine samples, from the north of Portugal. We have 11 variables based on a physiochemical test and 1 quality score variable.
winequality-red.csv - dataset picking-wine-it-s-all-in-the-data.ipynb - Kaggle python notebook