Hey there! My name is Andrés, Data Scientist and Economist. I have experience working on data analytics in the fields of Sustainable Development Goals and Infrastructure sector. As I became passionate about problem solving through data, I decided to become a Data Scientist. Through Machine Learning models, different statistical tools and data visualization, I seek valuable information through pattern recognition in specific topics.
- Predictive Factors of Powerlifting Competition Performance (Master´s Thesis): A Logistic Regression model on R to predict athletes reaching the podium with up to 79% accuracy and 85% AUC. Optimized through threshold adjustment and feature engineering.
- Flight Delays project: Using a Random Forest classifier and oversampling techniques with Python, determined the variables that predicted flight delays with a 89% accuracy.
- Lego Sets: A visual analysis project on R to identify patterns of Lego sets from 2018-2020. Star Wars is the most important theme, as it has an important impact in most variables measured.
- Oncologic cases: Detecting patients with tumors with a KNN model with an 85% accuracy rate using R.
- Telecom company study: Through statistical inference, determined that customer leakage can be explained up to 67% by customer seniority.
- Data analysis of small businesses: With SQL Snowflake, extracted meaningful data of small businesses in relation to sellings, economical values, returns and countries of origin.
- Languages: R, Python, SQL
- ML libraries: Caret, glmnet, scikitlearn
- Data preprocessing: tidyr, dplyr, numpy, pandas
- Data visualization libraries: ggplot2, matplotlib
- Other tools: Tableau, Snowflake