This is a personal project to extract and compare flight prices from two different websites: Decolar.com
and Passagens Promo
. The objective is to be able to compare the prices between the two websites after extracting, transforming, and loading the data. The project is inspired by a real business need that I experienced and it aims to help me train my web scraping skills that I have studied for real-world projects.
The project consists of 3 main .ipynb files:
- webScrappingFlightData.ipynb: This file contains all the code documentation and explanations of how it works and the logic behind it
- webScrappingFunctions.ipynb: This file contains all the functions used to generate the final result
- searchParametersGenerator.ipynb: This file contains the function to randomly generated search parameters for the main scripts.
Additionally, the repository includes the following files:
- Fligh Data.xlsx: This file contains the final output data
- Dim_iata.xlsx: This file contains a list of IATA codes for airports used by the .ipynb files
- search_parameters.xlsx: This file contains randomly generated search parameters for the .ipynb files
This project requires the following dependencies:
- Python 3
- Requests
- Beautiful Soup 4
- Pandas
To run this project, follow these steps:
-
Clone the repository to your local machine:
git clone https://github.com/Lacerdash/WebScrapping-Flight-Data.git
-
Navigate to the repository directory:
cd WebScrapping-Flight-Data
-
Open the
WebScrappingPassagens.ipynb
file in a Jupyter notebook environment or your preferred IDE, and run the cells to execute the code. -
The output files will be saved in the
output
directory.