Analysis for clients who are preparing to get into the cryptocurrency market.
Dataset:
Software and IDE:
- Python
- Jupyter Notebook
- Libraries:
- Pandas
- Sklearn
- hvPlot
- Unsupervised Machine Learning
Accountability Accounting, a prominent investment bank, is interested in offering a new cryptocurrency investment portfolio for its customers. As the cyptocurrency market is highly saturated and volatile, we created a report that includes what cryptocurrencies are on the trading market and how they could be grouped to create a classification system for this new investment. The intial data was not ideal, so the data had to be processed to fit the machine learning models. Since there is no known output, we decided to use unsupervised learning. To group the cryptocurrencies, we incorporated a clustering algorithm. Lastly, we used data visualizations to share our findings.
For the technical analysis of this project, we:
- Preprocessing the Data for PCA
- Reducing Data Dimensions Using PCA
- Clustering Cryptocurrencies Using K-means
- Visualizing Cryptocurrencies Results
Preprocessing the Data for PCA
Using our knowledge of Pandas
, we preprocessed the dataset in order to perform PCA. The crypto_data.csv
was retrieved from CryptoCompare. For this section of our project and methodology, we kept all the cryptocurrencies that are being traded. We then drop the IsTrading column and removed rows that have at least one null value. From this we then filtered the crypto_df DataFrame so it only has rows where coins have been mined. After we filtered the DataFrame, we create a new DataFrame that holds only the cryptocurrency names, and use the crypto_df DataFrame index as the index for the new DataFrame. A crucial step in this process was to remove the CoinName column from the crypto_df DataFrame since it's not going to be used on the clustering algorithm. The get_dummies() method was incorporated to create variables for the two text features, Algorithm and ProofType, and store the resulting data in a new DataFrame named X. Lastly, we used the StandardScaler
fit_transform() function to standardize the features from the X DataFrame.
Reducing Data Dimensions Using PCA
Using our knowledge of how to apply the Principal Component Analysis (PCA)
algorithm, we reduced the dimensions of the X DataFrame to three principal components and place these dimensions in a new DataFrame.
Clustering Cryptocurrencies Using K-means
Using our knowledge of the K-means
algorithm, we created an elbow curve using hvPlot
to find the best value for K from the pcs_df DataFrame that was created previously. Then, we ran the K-means
algorithm to predict the K clusters for the cryptocurrencies’ data.
Visualizing Cryptocurrencies Results
Using our knowledge of creating scatter plots with Plotly Express
and hvplot
, we visualized the distinct groups that correspond to the three principal components we created previously, then we created a table with all the currently tradable cryptocurrencies using the hvplot.table() function.