Team Members: Giovanni M. D'Antonio, Ethan Christian Tan, Abhay Srivastava, and Fucheng Warren Zhu
In this project, we investigated the racial and socio-economic inequities in health outcomes related to the consumption of processed foods. Our team’s efforts were recognized with a 3rd place finish out of 30 selected teams in the highly competitive Citadel Securities and Correlation One's Summer Invitational Datathon.
- Ranked 3rd out of 30 Teams: Our project stood out among 30 selected teams, showcasing our analytical rigor and innovative approach.
- Comprehensive Data Collection and Merging: We fetched and merged a cross-sectional and a panel dataset from multiple sources, including the USDA, US Census Bureau, and the County Health Rankings. This process involved meticulous data-quality checks to ensure the integrity and reliability of our data.
- Advanced Statistical Methods: We employed Bayesian Hierarchical Modeling alongside Fixed and Random Effects Regression to derive robust conclusions from our data. These methods allowed us to account for various levels of data structure and variability, enhancing the robustness of our findings.
- Detailed Reporting: In just 6 days, we produced a comprehensive 20-page report, complete with illustrative data visualizations and formatted with careful attention to detail using LaTeX.
- Programming Languages: Python, R
- Data Sources: USDA, US Census Bureau, County Health Rankings
- Statistical Methods: Bayesian Hierarchical Modeling, Fixed Effects Regression, Random Effects Regression
- Documentation: We used the Kaobook format for wide margins and convenient visualization.
- Investigative Focus: Our analysis centered on understanding how processed food consumption disproportionately affects vulnerable communities, particularly focusing on racial and socio-economic disparities.
- Data Visualization: The report includes various data visualizations that illustrate key findings and support our conclusions.
- Rigorous Analysis: We conducted thorough data-quality checks and employed sophisticated statistical models to ensure our results were both reliable and insightful.
We invite you to read our detailed report to explore our findings and methodologies. The report is available in this repository.