Skip to content

phamthiminhtu/kafka

Repository files navigation

Description

Streaming data with Kafka. Current projects:

CDC (change data capture)

kafka

The workflow is as follows:

  1. Stream data from Postgres to Kakfa using Debezium (log-based CDC), KSQL and Kafka Connect provided Confluent Platform. Code: ksql/source/source__postgres__airbnb.sql
  2. Sink data from Kafka to Google Cloud Storage using Kafka Connect (data is stored in hive-style partitioning). Code: connectors/sink/gcp/gcs-sink.json
  3. Automatically detect and create new topics as external tables on BigQuery using Dagster. Code: kafka-dagster/kafka_dagster/airbnb__gcs_to_bigquery_asset.py
  • Example of the DAG created on Dagster:
image

Useful resources

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published