Skip to content
This repository has been archived by the owner on Nov 8, 2024. It is now read-only.

This repository contains code snippets for a backend platform that calculates real-time taxi fares based on comfort levels. It leverages Kafka for data streaming, ElasticSearch for indexing, and BigQuery for revenue analysis, computing fares by analyzing driver-customer distances, categorized by comfort level and location.

Notifications You must be signed in to change notification settings

mriusero/real-time-fare-calculator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 

Repository files navigation

DATASTREAM - Real-time Taxi Fare Calculation Platform

Overview

This project developed in a learning context consisted of develop back-end platform designed to calculate taxi fares in real time based on selected comfort levels. This project leverages Kafka for streaming data, ElasticSearch for indexing and monitoring, and Google BigQuery for revenue analysis and clustering. The platform computes fares by analyzing the distance between drivers and customers, grouped by comfort level and geographic clusters.

Table of Contents

  1. Introduction
  2. Data Model
  3. TravelProcessor Development
  4. Data Transformation
  5. Indexing & Monitoring
  6. Data Warehouse and BigQuery
  7. Revenue Calculation by Cluster and Comfort Level
  8. Data Visualization

Introduction

This project aims to provide an accurate fare calculation for taxi trips by estimating the distance between drivers and customers. Fare determination takes into account the selected level of comfort for each trip.

intro

Data Model

Incoming data is processed in real-time via Kafka, using Python's KafkaProducer. This data includes customer requests, driver availability, and location information.

Note: Data, including location coordinates (longitude, latitude), is randomly generated, which may result in inconsistent map locations.

data_model

TravelProcessor Development

The TravelProcessor module is designed and tested based on a predefined architecture. Unit tests ensure the reliability and accuracy of distance calculations and fare estimations.

data_processor

Data Transformation

Custom transformations are applied to the incoming data, with examples of transformations defined in TravelProcessor using JoltTransformJSON.

transformation

Indexing & Monitoring

The platform uses ElasticSearch for:

  • Indexing & Mapping: Data indexing and mapping are defined to optimize search and retrieval of fare and distance calculations.
  • Performance Monitoring: Real-time monitoring of data streams and performance analysis.
  • Data Visualization: Kibana provides a detailed and real-time visualization of processed data.

indexing

Data Warehouse and BigQuery

All records (10,000+) are consolidated and stored in a data warehouse in .parquet format, along with timestamps, using Google BigQuery.

merge

Revenue Calculation by Cluster and Comfort Level

Revenue calculations are performed using a K-Means clustering model developed with BigQuery ML:

  • Clustering - Eight clusters are identified based on geographical coordinates.
  • Revenue Analysis - Revenue per cluster and comfort level is calculated to assist in understanding profitability across different areas and service types.

data warehouse calculation

Data Visualization

The results from the clustering and revenue calculations are visualized in Looker Studio, providing an insightful view of the model outcomes and revenue distributions.

visualisation

About

This repository contains code snippets for a backend platform that calculates real-time taxi fares based on comfort levels. It leverages Kafka for data streaming, ElasticSearch for indexing, and BigQuery for revenue analysis, computing fares by analyzing driver-customer distances, categorized by comfort level and location.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published