This project is aimed to build a content-based recommendation system to help Scotch whisky salesmen to recommend Scotch Whisky to a customer based on the quantified quality of the Scotch whiskies.
Please refer to Report to understand the process of how the recommendation system is built. Also, there is a Medium Post to talk about this project. If you would like to read the Medium post, you may refer to the link
If you are interested with the application which can recommend whisky to you, you may check out the Recommendation Application folder. If you would like to read the Medium post about the frontend upgrade of this application, you may go to this link.
We have reviewed Lapointe and Legendre's algorithm on improving the recommendation system, you may find out more at this folder, or go to this link to read the Medium post on the findings.
Here are the brief structure of this repository.
The original data is downloaded from Kaggle which obtained the data set from WhiskyClassified.com.
In the original data set contains 12 columns of characters or flavors, including body, sweetness, smoky...etc. Besides those features, there are columns of distillery name, postcode, UTM latitude and UTM longitude of the distilleries.
Additionally to the original 86 rows by 12 columns data set, I added three more columns:
- Latitude in degree
- Longitude in degree
- Region (Region Classification of Whisky Distillery)
You may find the dataset here or the documentation of the dataset in the Data folder.
Here is the map of distilleries location with region classification.
The goal of this project is to build a content-based recommendation system for whiskies. It means recommending a whisky based on the similarity between two whiskies. There are more than 86 brands of Scotch whisky and I want a model/system to recommend other brands based on the characters and flavor.
The first approach is to classify which Whisky Region the whisky distilleries are classified. The idea is that each region has its general flavor and characters of the whiskies. The assumption is that a person who likes one highland whisky, I will recommend other highland whisky to that person. The plan of this approach is that once we have trained with a model from 86 distilleries, we can classify the region of the 87th whisky distillery from the model.
The detail of the code may be found in the Region Classification Folder
However, the result of the classification model does not meet expectation. So, the second step is train the model in hierarchical clustering.
Second approach is to use dendrogram to display the hierarchical relationship among distilleries. Dendrogram is one of the algorithms in hierarchical clustering. The idea is to use the quantified characters and flavor to calculate the similiarity of distilleries.
However, the dendrogram is hard to find similarity among whiskies and interpret. Also, it is hard to train the salesmen to read the dendrogram when they make any recommendation.
This is the Python code for the dendrogram or the Dendrogram Folder
The last approach is to cluster the similar distilleries by k-means. And here is the Python code here The problem is find the best k for the algorithm.
You may go to the Clusters folder to find out the process of how the recommendation system using k-means. Here is the result:
- Group 1: Aberfeldy, Aberlour, Anchroisk, BenNevis, Benrinnes, Benromach, BlairAthol, Craigallechie, Deanston, Edradour, Glenfarclas, Glenlivet, Glenturret, Knochando, Longmorn, Old Fettercairn, Scapa, Strathisla
- Group 2: AnCnoc, Auchentoshan, Aultmore, Bunnahabhain, Cardhu, Craigganmore, Dalwhinnie, Dufftown, Glen Elgin, Glen Grant, Glen Keith, Glen Moray, Glenallachie, Glengoyne, Glenmorangie, Mannochmore, Miltonduff, Speyside, Strathmill, Tamdhu, Tamnavulin, Tobermory
- Group 3: Ardbeg, Caol Illa, Clynelish, Lagavulin, Laphroaig, Talisker
- Group 4: Arran, Belvenie, Benriach, Bladnoch, Glen Deveron Macduff, Glen Garioch, Glen Ord, Glen Spey, Glenfiddich, Glenkinchie, Glenlossie, Glenrothes, Inchgower, Linkwood, Royal Brackla, Speyburn, Teaninich, Tomatin, Tomintoul, Tullibardine
- Group 5: Ardmore, Balblair, Bowmore, Bruichladdich, Glen Scotia, Highland Park, Isle of Jura, Loch Lomond, Oban, Old Pulteney, Springbank, Tormore
- Group 6: Balmenach, Dailuaine, Dalmore, Glendronach, Glendullan, Macallan, Mortlach, Royal Lochnagar
The result of the optimal clustering looks like this on the map:
The model using the k-means algorithm is useful for recommendating Scotch Whisky. So I decided to use k-mean, with k=6, to build the recommendation system.
The application will use the k-means algorithm model, while K=6 to train a K Mean model, to calculate the recommendation quantitatively. The are three ways to return a list of recommendation:
- Enter a whisky distillery name
- Choose from a list of characters and flavors
- Nothing
- If we enter a whisky distillery name, the application return a list of whiskies within the same cluster. The list of whiskies are sorted by the flavor similarity
- If we choose from a list of character and flavors, the application return a list of whiskies that meets the criteria. Note that the whiskies on the list do not belong the same cluster.
- If nothing is entered, application suggests Macallan. (See the Application folder for explaination)
The application may be run on command line for developer or on a user-friendly GUI.
You may find the recommendation application in this folder.
You may find the recommendation application using Lapointe and Legendre's algorithm in this folder.
In the Report folder, there is a report of going over how the recommendation system is built by choosing the best model from Region Classification, Dendrogram, and K-means Clustering.
In the Lapointe et Legendre folder, we are going to revisit the clusting approach with the algorithm suggested by Lapointe and Legendre's paper A classification of pure malt Scotch whiskies. Our goal is to improve the recommendation application.
You may find the recommendation application using Lapointe and Legendre's algorithm in this folder.
- Whisky means life of water in Gaelic
- Whisky or Whiskey? Whisky is the spelling used in Scotch whiskies, while whiskey is commonly spelled in Irish whiskeys. This repository will spell whisky for Scotch whiskies and whiskies produced with Scotch whiskies School of Thought, such as Japanese, Taiwanese whiskies, and whiskeys for Irish and American whiskeys.
- The pural form of whisky and whiskey are whiskies and whiskeys, respectively