The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
-
Updated
Sep 1, 2025 - Python
The leading data integration platform for ETL / ELT data pipelines from APIs, databases & files to data warehouses, data lakes & data lakehouses. Both self-hosted and Cloud-hosted.
Postgres to Elasticsearch/OpenSearch sync
Ecommerce Realtime Data Pipeline (Data Modeling, Workflow Orchestration, Change Data Capture, Analytical Database and Dashboarding)
Cloud-native, durable state for AI agents: WAL+snapshots, watch streams, idempotency, leases, TLS/mTLS, capability tokens, Python/TS SDKs, Helm.
Repo for CDC with debezium blog post
Slowly Changing Dimension type 2 using Hive query language using exclusive join technique with ORC Hive tables, partitioned and clustered hive table performance comparison
Sample project that describes how you can handle schema within your Django application.
Example pipeline to stream the data changes from RDBMS to Apache Iceberg tables
Keep in sync RDB table with Hive structured store. Added Kafka as a buffer between those two tables.
Lightweight CDC patterns for SQLite
This is a tryout I prepared to demonstrate CDC (change data capture) using MySQL, Maxwell and Kafka.
Showcases real-time data replication from RDS (MariaDB) to Kinesis using AWS DMS on LocalStack. Implements both full-load and Change Data Capture (CDC) tasks to stream database changes for analytics.
Change Data Capture (CDC) tool from any source(s) to any target
A provider-agnostic framework to evaluate ordinary CDC (Change Data Capture) features
Transactional change feeds for SQLite
The Yelp Data Pipeline processes business reviews using Python, Kafka, AWS (DynamoDB, S3, Redshift), PySpark, AWS Lambda, and Power BI. It supports real-time streaming, CDC, daily batch processing, and data visualization for insights into customer sentiment, business performance, and industry trends.
This project create data stream from mysql using replication protocols and ingest into kafka. You can create event driven system using this.
Data decoding, encoding, conversion, and translation utilities.
Showcasing CDC with PostgresSQL pglogical plugin and custom scripts.
Distributed change data capture (CDC) framework for Google BigQuery
Add a description, image, and links to the change-data-capture topic page so that developers can more easily learn about it.
To associate your repository with the change-data-capture topic, visit your repo's landing page and select "manage topics."