Skip to content

🚘Car price comparison app using scraping. Currently scraping Polish websites: otomoto and gratka.

Notifications You must be signed in to change notification settings

AJSTO/All-Car-Ads-Hub-Price-Comparison-Application

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

13 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

All_Car_Ads_Hub

πŸ”— Link to web app: All_Car_Ads_Hub

πŸ‘¨β€πŸ’» Built with:

πŸ“– Descripction about project:

The project is a web application built on the Django framework and deployed on CloudRun. Its main purpose is to facilitate users in finding relevant car listings by inputting specific search parameters. The application utilizes the BeautifulSoup (bs4) library to scrape data from popular online classified platforms such as otomoto.pl and gratka.pl.

Users can input various parameters like brand, model, year of production, or mileage. The application then searches through the classified pages to find listings that match the specified criteria. The discovered listings are presented on a single page along with key parameters and visualized graphs, making it easier for users to quickly compare available options.

In future iterations of the project, there are plans to expand the functionality to include scraping data from other classified platforms such as Allegro. However, due to the volume of data being collected and the time required for scraping, the initial idea of storing data in BigQuery was abandoned. Currently, the application operates in real-time, meaning that data is scraped on-the-fly and presented to the user without the need for storage in a database.

πŸ” App preview:

Project Screenshot

πŸ”ͺ Beautiful Soup:

The scraper collects data from each advertisement from automotive listings on otomoto.pl and gratka.pl and organizes it into dictionaries. The structure of each dictionary is as follows:

{
    'marka_value': 'Toyota',
    'model_value': 'Corolla',
    'cena_value': 7500.0,
    'waluta_value': 'PLN',
    'rok_produkcji_value': 2005,
    'przebieg_value': 205000.0,
    'pojemnosc_value': '1398',
    'moc_value': 97.0,
    'typ_nadwozia_value': 'Auta maΕ‚e',
    'liczba_drzwi_value': '3',
    'liczba_miejsc_value': '5',
    'kolor_value': 'Granatowy',
    'kraj_pochodzenia_value': None,
    'zarejestrowany_w_polsce_value': None,
    'stan_value': 'UΕΌywane',
    'lokalizacja_value': 'BrwinΓ³w, pruszkowski, Mazowieckie',
    'tytul_value': 'Toyota Corolla 1.4 VVT-i Terra',
    'url_value': 'https://www.otomoto.pl/osobowe/oferta/toyota-corolla-toyota-corolla-2005-1-4-ID6G3GIP.html',
    'strona_value': 'otomoto'
}

The last key, strona_value (page), indicates the source of the scraped data, either otomoto or gratka. This value is dynamic and depends on the website being scraped.

Once the script collects all possible advertisements, it compiles them into a list of dictionaries, and the data is then sent to a Django web application in JSON format. This allows the Django application to process and display the scraped automotive listings effectively.

The structure of the dictionary is designed to capture key details about each car listing, facilitating easy integration with the Django application and providing users with comprehensive information about available vehicles.

πŸ–₯️ Frontend:

Basic frontend has been developed to complement the car listings scraper. Frontend is a simple, single-page interface, due to the lack of experience in frontend development. While not flawless, it serves the purpose of interacting with the underlying scraper.

Features:

  • Search Form:
    • The default page consists of a fieldset with options to select various search parameters:
      Brand, Model, Year Range, Engine Capacity Range, Price Range, Fuel Type, Mileage Range,
      Gearbox Type, Engine Power Range, Town, Distance from Town and Voivodship.
    • Voivodship options are currently reserved for future use when data is sourced from Allegro.
  • Search Results Table:
    • Clicking the "Search" button yields a table of results.
    • Each row includes information about the source page, a link with the advertisement title, and details such as year of production, mileage, location, engine capacity, engine power, and the advertised price.
    • Sorting functionality is available by clicking the icon next to the column header.
    • Additional information about the median price and median mileage is provided.
  • Interactive Scatterplot:
    • A scatterplot visually represents the dispersion of prices against mileage.
    • Clicking on a data point corresponding to an advertisement redirects to the respective listing page.
  • Price Histogram:
    • A histogram divides prices into 10 bins, providing an overview of the distribution of prices.

Note on Voivodships:

Voivodships functionality is intended for future use, particularly when data is successfully acquired from Allegro.

While the frontend may lack sophistication, it serves its purpose in presenting and interacting with the scraped car listings data. Future improvements and refinements are anticipated as the project evolves.

🌳 Project Scructure:

.
β”œβ”€β”€ AllCarAdsHub
β”‚Β Β  β”œβ”€β”€ AllCarAdsHub
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ asgi.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ settings.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ urls.py
β”‚Β Β  β”‚Β Β  └── wsgi.py
β”‚Β Β  β”œβ”€β”€ allcaradshub_app
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ admin.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ apps.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ gratka.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ migrations
β”‚Β Β  β”‚Β Β  β”‚Β Β  └── __init__.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ models.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ otomoto.py
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ templates
β”‚Β Β  β”‚Β Β  β”‚Β Β  └── home.html
β”‚Β Β  β”‚Β Β  β”œβ”€β”€ tests.py
β”‚Β Β  β”‚Β Β  └── views.py
β”‚Β Β  β”œβ”€β”€ db.sqlite3
β”‚Β Β  └── manage.py
β”œβ”€β”€ Dockerfile
└── requirements.txt

☁️ Deploying project on CloudRun:

1. Build and Tag Docker Image:

  • Build your Docker image using the docker buildx command. For example:
docker buildx build --platform linux/amd64,linux/arm64 -t your-image-name:v1 .
  • Tag the image:
docker tag your-image-name:v1 gcr.io/your-project-id/your-image-name:v1

2. Push Docker Image to Google Cloud Registry:

  • Allow gcloud to use service account credentials to make requests:
 gcloud auth activate-service-account [ACCOUNT] --key-file=KEY_FILE
  • Authenticate Docker with Google Cloud Registry:
gcloud auth configure-docker
  • Push the Docker image to the Google Cloud Registry:
docker push gcr.io/your-project-id/your-image-name:v1

3. Deploy on Cloud Run:

  • Deploy the Docker image to Cloud Run:
gcloud run deploy your-service-name \
  --image gcr.io/your-project-id/your-image-name:v1 \
  --platform managed \
  --port 8000
  • Follow the prompts to set additional configurations, such as allowing unauthenticated access or specifying environment variables.

4. Access the Deployed Service:

Once the deployment is complete, you will receive a URL for your Cloud Run service. You can access your application by navigating to that URL in a web browser. Now, your Docker image is deployed to Google Cloud Registry, and your service is running on Cloud Run. Remember to replace placeholders like your-project-id, your-image-name, and your-service-name with your actual project ID, image name, and desired service name. Adjust the version tag (v1) and other configurations as needed for your project.

You can also deploy application locally.

πŸ“¦ Continuous Deployment with GitHub Actions

This project leverages GitHub Actions to automate the deployment workflow to Google Cloud Run. The deploy.yml file in the .github/workflows directory defines a workflow that is triggered on each push to the main branch. The workflow utilizes Google Cloud's GitHub Actions to set up the necessary environment, authenticate with Google Cloud, build and publish a Docker image, and deploy the application to Google Cloud Run.

To securely manage sensitive information, such as Google Cloud service account credentials and project details, the workflow relies on GitHub Secrets. These secrets include:

  • GCP_APPLICATION - name must use only lowercase alphanumeric characters and dashes, cannot begin or end with a dash, and cannot be longer than 63 characters.,
  • GCP_CREDENTIALS - contents of the JSON key file,
  • GCP_EMAIL - service account e-mail like: SERVICE_ACCOUNT_USERNAME@PROJECT_ID.iam.gserviceaccount.com,
  • GCP_PROJECT - your project id, which are used during the deployment process. The workflow ensures the seamless deployment of the application to Google Cloud Run, providing an efficient and automated deployment pipeline.

To customize the deployment settings, modify the workflow file (deploy.yml) and update the corresponding GitHub Secrets with your Google Cloud Project details. This automated deployment pipeline streamlines the process of deploying updates to your application, ensuring a smooth and efficient development workflow.

About

🚘Car price comparison app using scraping. Currently scraping Polish websites: otomoto and gratka.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published