This was the final project for Data Science: Python (DAT-5301) course at Hult International Business School. It focused on exploring World Bank dataset consisting of 41 features on 217 countries. This report focuses on the Southern and Eastern Africa region of the dataset, consisting of 22 countries. The purpose of the project was to conduct a thorough data analysis and share insightful conclusions.
In this project, teams are tasked with conducting an analysis utilizing data from the World Bank. Each student group shall receive a dataset on countries in one region of the world at random and are expected to:
- Conduct an exploratory data analysis using Python
- Formulate a strategy for missing values and identifying potential outliers
- Develop a Jupyter Notebook on their process and findings (with ample use of markdown)
Using 1,000 - 1,500 words of markdown:
- Introduce your region from a non-technical perspective (culture, world-famous aspects, etc.).
- Select one country from your region that you feel best represents it "on average". Include the rationale for your choice and support it with Python code.
- Identify any obscure findings in the data. In other words "Does the data accurately reflect the region? Can your region's numbers be trusted?"
- Explain your strategy for missing values, as well as your strategy for identifying outliers.
- Select the Top 5 features (i.e. columns) of the dataset that best exemplify your region. In other words, "What makes your region unique when compared to the rest of the world?"
- Support your findings with domain knowledge (i.e. research from external sources). Make sure to site your sources.