- Data Source: crypto_data.csv, CryptoCompare
- Software: Python 3.9.2, Anaconda Navigator 1.9.12, Conda 4.8.4, Jupyter Notebook 6.0.3
The purpose of this project is to use unsupervised machine learning to analyze a database of cryptocurrencies and create a report including the traded cryptocurrencies classified by group according to their features. In practice, this classification report could be used by an investment bank to propose a new cryptocurrency investment portfolio to its clients.
- preprocessing the database
- reducing the data dimension using Principal Component Analysis
- clustering cryptocurrencies using K-Means
- visualizing classification results with 2D plots
After preprocessing and cleaning, we have a total of 532 tradable cryptocurrencies.
Deployed unsupervised machine learning to identify clusters of the cryptocurrencies. The elbow curve below using the K-Means method iterating on k values from 1 to 10.
The best k value appears to be 4 so we would conclude on an output of 4 clusters to categorize the crytocurrencies.
This 2-D scatter plot was obtained using the PCA algorithm to reduce the crytocurrencies dimensions to two principal components. This plot shows the distribution and the four clusters of cryptocurrencies. Amongst other variability, we're able to identify outliers like the unique cryptocurrency in the class #2.
Most of the cryptocurrencies are part of class #0 and #1.The snapshot above shows that BitTorrent is the only cryptocurrency in class #2.
The PCA algorithm identifies class as a parameter, and is the better visualization.
We have identified the classification of 532 cryptocurrencies based on feature similarities. To determine their performance along with potential IB interest, further analysis on unique group traits should be conducted.