This project aims to analyze global commercial airplane crashes to uncover critical insights about their frequency, causes, and impact. By utilizing statistical methods and data visualization, this project identifies trends, explores contributing factors, and highlights patterns in crash occurrences and fatalities.
- Examine the frequency and causes of airplane crashes.
- Analyze the relationship between crash fatalities, causes, and economic factors such as GDP per capita.
- Visualize key metrics to provide insights into the data.
- Source:
commercialairplanecrashes.csv
- Key Features:
- Numerical:
Fatalities
,Country GDP per Capita
- Categorical:
Cause of Crash
,Year
,Country
- Numerical:
Feature | Description |
---|---|
Cause of Crash |
Cause of the crash (e.g., mechanical error). |
Fatalities |
Total number of fatalities in the crash. |
Country |
Country where the crash occurred. |
Year |
Year of the crash. |
Country GDP per Capita |
GDP per capita of the country where the crash occurred. |
-
Crash Cause Analysis:
- Frequency distribution of crash causes.
- Visualized the most common causes using bar plots.
-
Fatalities Analysis:
- Analyzed fatalities by year.
- Mapped crash locations to identify regional trends.
-
Economic Correlation:
- Investigated the relationship between GDP per capita and crash occurrences.
- Observed that most crashes occur in countries with GDP per capita below specific thresholds:
- 82.7% of crashes: GDP < $20,000.
- 62% of crashes: GDP < $10,000.
- 51.7% of crashes: GDP < $5,000.
-
Handling Missing or Ambiguous Data:
- Ensured all columns had complete and valid data.
-
Standardizing Categorical Data:
- Encoded categories (e.g., causes, countries) for machine learning compatibility.
-
Outlier Detection:
- Identified and flagged extreme values in fatalities using statistical methods.
-
Bar Plot: Crash Causes
-
Fatalities Over Time
-
Economic Analysis:
-
Scatter Plot: Fatalities vs. Causes
-
Categorical Encoding:
- Converted crash causes and countries into numerical representations using one-hot encoding.
-
Outlier Detection and Handling:
- Data points with extreme fatality values were removed using the interquartile range (IQR) method.
-
Data Normalization:
- Standardized numerical columns (e.g., GDP, fatalities) for consistent analysis.
R
for data manipulation, visualization, and numerical operations
-
Crash Causes:
- Mechanical errors, weather conditions, and pilot error are among the most frequent causes.
- Weather-related crashes tend to result in higher fatalities.
-
Fatalities and Economic Factors:
- Lower GDP per capita correlates with higher crash frequencies, indicating potential links to infrastructure and safety regulations.
-
Yearly Trends:
- Annual fatalities fluctuate, with peaks observed in specific years tied to catastrophic events.
The bar plot correlating GDP per capita with crash frequency effectively illustrates the economic disparities influencing aviation safety.
This project highlights the critical factors influencing global airplane crashes, emphasizing economic disparities and common causes. The findings can inform future safety measures and resource allocation to reduce crash frequencies and fatalities.