Data Science portfolio of ipython notebooks implementing several Machine Learning algorithms following a structured, well-organized methodology to face each challenge:
[Data acquisition -> Data cleaning -> Data analysis -> Algorithm implementation -> Algorithm applied to dataset -> further optimization and advanced topics]
The notebooks cover a variety of topics and algorithms:
Algorithm | Model | Topic |
---|---|---|
Recommender | Matrix Factorization - ALS | LastFM music-user-artist data |
Regression | Random Forests | Airplane Delay |
Simulation | MonteCarlo in TimeSeries | Finantial Risk |
Clustering | KMeans | Network Traffic and Anomaly Detection |
Clustering | KMeans in TimeSeries | Timeseries of NeuroImages |
The last couple of notebooks belong to a Challenge by SAFRAN, two three-hour sessions that were part of their recruitment process. They served as the ultimate test to everything learnt beforehand, since no work was allowed out of the sessions.
- Language: Python over Jupyter Notebooks.
- Execution: set over a remote Spark cluster in EURECOM, managed by Zoe
- Libraries: numpy, pandas, matplotlib, pyspark, thunder
- Ole Andreas Hansen @oleaha
- Alberto Ibarrondo Luis @ibarrond
The rough sketches of all the notebooks are the main focus of the course Algorithmic Machine Learning in EURECOM, and in particular Pietro Michiardi
The majority of the Notebooks are based on use cases illustrated in the book Advanced Analytics with Spark, by Sandy Ryza, Uri Laserson, Sean Owen & Josh Wills.
The Notebooks are based on publicly available data.
MIT Free software