Skip to content

This project is to build a content-based Scotch Whisky recommendation system to help to sell Scotch Whiskies.

Notifications You must be signed in to change notification settings

jacquessham/ScotchWhisky

Repository files navigation

Scotch Whisky Recommendation System

This project is aimed to build a content-based recommendation system to help Scotch whisky salesmen to recommend Scotch Whisky to a customer based on the quantified quality of the Scotch whiskies.

Please refer to Report to understand the process of how the recommendation system is built. Also, there is a Medium Post to talk about this project. If you would like to read the Medium post, you may refer to the link

If you are interested with the application which can recommend whisky to you, you may check out the Recommendation Application folder. If you would like to read the Medium post about the frontend upgrade of this application, you may go to this link.

We have reviewed Lapointe and Legendre's algorithm on improving the recommendation system, you may find out more at this folder, or go to this link to read the Medium post on the findings.

Here are the brief structure of this repository.

Data

The original data is downloaded from Kaggle which obtained the data set from WhiskyClassified.com.

In the original data set contains 12 columns of characters or flavors, including body, sweetness, smoky...etc. Besides those features, there are columns of distillery name, postcode, UTM latitude and UTM longitude of the distilleries.

Additionally to the original 86 rows by 12 columns data set, I added three more columns:

  • Latitude in degree
  • Longitude in degree
  • Region (Region Classification of Whisky Distillery)



You may find the dataset here or the documentation of the dataset in the Data folder.



Here is the map of distilleries location with region classification.

Goal of this Project

The goal of this project is to build a content-based recommendation system for whiskies. It means recommending a whisky based on the similarity between two whiskies. There are more than 86 brands of Scotch whisky and I want a model/system to recommend other brands based on the characters and flavor.

Region Classification

The first approach is to classify which Whisky Region the whisky distilleries are classified. The idea is that each region has its general flavor and characters of the whiskies. The assumption is that a person who likes one highland whisky, I will recommend other highland whisky to that person. The plan of this approach is that once we have trained with a model from 86 distilleries, we can classify the region of the 87th whisky distillery from the model.

The detail of the code may be found in the Region Classification Folder



However, the result of the classification model does not meet expectation. So, the second step is train the model in hierarchical clustering.

Dendrogram

Second approach is to use dendrogram to display the hierarchical relationship among distilleries. Dendrogram is one of the algorithms in hierarchical clustering. The idea is to use the quantified characters and flavor to calculate the similiarity of distilleries.





However, the dendrogram is hard to find similarity among whiskies and interpret. Also, it is hard to train the salesmen to read the dendrogram when they make any recommendation.

This is the Python code for the dendrogram or the Dendrogram Folder

Clustering

The last approach is to cluster the similar distilleries by k-means. And here is the Python code here The problem is find the best k for the algorithm.

You may go to the Clusters folder to find out the process of how the recommendation system using k-means. Here is the result:

  • Group 1: Aberfeldy, Aberlour, Anchroisk, BenNevis, Benrinnes, Benromach, BlairAthol, Craigallechie, Deanston, Edradour, Glenfarclas, Glenlivet, Glenturret, Knochando, Longmorn, Old Fettercairn, Scapa, Strathisla
  • Group 2: AnCnoc, Auchentoshan, Aultmore, Bunnahabhain, Cardhu, Craigganmore, Dalwhinnie, Dufftown, Glen Elgin, Glen Grant, Glen Keith, Glen Moray, Glenallachie, Glengoyne, Glenmorangie, Mannochmore, Miltonduff, Speyside, Strathmill, Tamdhu, Tamnavulin, Tobermory
  • Group 3: Ardbeg, Caol Illa, Clynelish, Lagavulin, Laphroaig, Talisker
  • Group 4: Arran, Belvenie, Benriach, Bladnoch, Glen Deveron Macduff, Glen Garioch, Glen Ord, Glen Spey, Glenfiddich, Glenkinchie, Glenlossie, Glenrothes, Inchgower, Linkwood, Royal Brackla, Speyburn, Teaninich, Tomatin, Tomintoul, Tullibardine
  • Group 5: Ardmore, Balblair, Bowmore, Bruichladdich, Glen Scotia, Highland Park, Isle of Jura, Loch Lomond, Oban, Old Pulteney, Springbank, Tormore
  • Group 6: Balmenach, Dailuaine, Dalmore, Glendronach, Glendullan, Macallan, Mortlach, Royal Lochnagar



The result of the optimal clustering looks like this on the map:


The model using the k-means algorithm is useful for recommendating Scotch Whisky. So I decided to use k-mean, with k=6, to build the recommendation system.

Application

The application will use the k-means algorithm model, while K=6 to train a K Mean model, to calculate the recommendation quantitatively. The are three ways to return a list of recommendation:

  • Enter a whisky distillery name
  • Choose from a list of characters and flavors
  • Nothing

  1. If we enter a whisky distillery name, the application return a list of whiskies within the same cluster. The list of whiskies are sorted by the flavor similarity
  2. If we choose from a list of character and flavors, the application return a list of whiskies that meets the criteria. Note that the whiskies on the list do not belong the same cluster.
  3. If nothing is entered, application suggests Macallan. (See the Application folder for explaination)

The application may be run on command line for developer or on a user-friendly GUI.



You may find the recommendation application in this folder.

You may find the recommendation application using Lapointe and Legendre's algorithm in this folder.

Report

In the Report folder, there is a report of going over how the recommendation system is built by choosing the best model from Region Classification, Dendrogram, and K-means Clustering.

Whisky Analysis by Lapointe and Legendre

In the Lapointe et Legendre folder, we are going to revisit the clusting approach with the algorithm suggested by Lapointe and Legendre's paper A classification of pure malt Scotch whiskies. Our goal is to improve the recommendation application.

You may find the recommendation application using Lapointe and Legendre's algorithm in this folder.

Glossary

  • Whisky means life of water in Gaelic
  • Whisky or Whiskey? Whisky is the spelling used in Scotch whiskies, while whiskey is commonly spelled in Irish whiskeys. This repository will spell whisky for Scotch whiskies and whiskies produced with Scotch whiskies School of Thought, such as Japanese, Taiwanese whiskies, and whiskeys for Irish and American whiskeys.
  • The pural form of whisky and whiskey are whiskies and whiskeys, respectively

About

This project is to build a content-based Scotch Whisky recommendation system to help to sell Scotch Whiskies.

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages