Disclaimer: 🚧 The task instructions are in Polish, and the presentation language is in Polish. English translation is provided for README documentation purposes.
This repository encapsulates the collaborative effort of our group for the Processing Structured Data course during the academic year 2022/2023. The project delves into the analysis of airline on-time performance data, aiming to extract meaningful insights.
Data: Harvard Dataverse link
Our primary objective was to explore diverse facets of airline on-time performance using the provided dataset. This dataset spans from October 1987 to April 2008, covering crucial details such as departure and arrival times, carrier information, delays, and more.
Our exploration focused on answering a range of questions related to airline on-time performance, including but not limited to:
- What are the reasons for flight cancellations?
- In which year were the most flight cancellations?
- 50 most popular flights in years 1996-2008
- What could have been the reasons for cascade cancellations?
- How does the year of production of an aircraft affect the delays?
- Do newer models better shorten the departure delay?
- Is there a correlation between some aircraft models or manufacturers and catching up with delays?
The project was evaluated based on the following criteria:
- Inclusion of code necessary for loading datasets and generating presented results.
- Development of code generating interesting results answering research questions.
- Presentation of results in a clear and concise manner within the allocated time.
Developed by: @Michał Pytel, @Krzysztof Tkaczyk
This project is licensed under the MIT License - see the LICENSE file for details.
We extend our gratitude to the American Statistical Association for providing the dataset and the Harvard Dataverse for hosting it.