data-engineering

Here are 62 public repositories matching this topic...

metarank / metarank

A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine

search kubernetes data-science machine-learning scala deep-learning personalization data-engineering feature-extraction ranking neural-networks feature-engineering automl

Updated Jun 27, 2025
Scala

feathr-ai / feathr

Star

Feathr – A scalable, unified data and AI engineering platform for enterprise

data-science machine-learning apache-spark azure artificial-intelligence data-engineering feature-engineering data-quality mlops feature-store feature-management feature-marketplace feature-governance feature-metadata feature-platform

Updated Apr 4, 2024
Scala

swoop-inc / spark-alchemy

Star

Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive

data-science scala spark data-engineering

Updated Feb 12, 2023
Scala

SETL-Framework / setl

Star

A simple Spark-powered ETL framework that just works 🍺

data-science machine-learning framework scala big-data spark pipeline etl data-transformation data-engineering dataset data-analysis modularization setl etl-pipeline

Updated Jul 29, 2025
Scala

starlake-ai / starlake

Star

Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.

bigquery spark etl snowflake data-engineering hdfs data-integration redshift synapse data-pipeline

Updated Aug 27, 2025
Scala

dimajix / flowman

Star

Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.

scala sql big-data spark apache-spark hadoop etl bigdata data-engineering flowman

Updated Aug 13, 2025
Scala

galliaproject / gallia-core

Star

A schema-aware Scala library for data transformation

json data-science scala spark etl data-transformation data-engineering data-manipulation feature-engineering nesting

Updated Feb 23, 2024
Scala

CoxAutomotiveDataSolutions / waimak

Star

Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.

scala spark hadoop data-engineering

Updated Apr 24, 2024
Scala

StabRise / spark-pdf

Star

PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it

pdf data-science ocr big-data spark tesseract data-engineering data-extraction tesseract-ocr pdf-document ocr-recognition pdf-document-processor spark-datasource

Updated Apr 27, 2025
Scala

mattlianje / etl4s

Star

Powerful, whiteboard-style ETL

streaming big-data etl functional-programming data-engineering

Updated Aug 24, 2025
Scala

CoxAutomotiveDataSolutions / spark-distcp

Star

A re-implementation of Hadoop DistCP in Apache Spark

spark apache-spark hadoop data-engineering distcp

Updated Dec 20, 2023
Scala

vitaliihonta / scala-ql

Star

Data manipulation and reporting for Scala.

json functional scala csv functional-programming dsl excel data-engineering data-manipulation scala3

Updated May 4, 2023
Scala

opensnowcat / opensnowcat-collector

Sponsor

Star

OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)

analytics snowplow data-engineering event-pipeline data-pipeline

Updated Aug 21, 2025
Scala

opensnowcat / opensnowcat-enrich

Sponsor

Star

OpenSnowcat Enricher (Apache 2.0 License)

analytics snowplow data-engineering event-pipeline data-pipeline

Updated Aug 22, 2025
Scala

AhmetFurkanDEMIR / Flink-Example

Sponsor

Star

Flink Example

docker scala kafka ubuntu apache data-engineering apache-flink flink kafka-streams debezium flink-stream-processing data-stream-processing flink-streaming flink-sql debeziumkafkaconnector debezium-connector debezium-client scala2

Updated Nov 19, 2023
Scala

dataintoresults / data-brewery

Star

Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage data warehouse workflow.

etl data-warehouse data-engineering elt datawarehouse

Updated Jan 21, 2021
Scala

innFactory / akka-lift-ml

Star

akka http service for serving spark machine learning models

machine-learning scala akka akka-http spark data-engineering fast-data

Updated Aug 11, 2017
Scala

HuemulSolutions / huemul-bigdatagovernance

Star

Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, la…

Updated Apr 21, 2023
Scala

JHLeeeMe / fake-data-pipeline

Star

Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana

docker scala kafka spark docker-compose grafana postgresql data-engineering data-pipeline

Updated Jan 31, 2023
Scala

david-siqi-liu / sparklyclean

Star

Optimal distributed data deduplication and supervised learning pipeline using Apache Spark

distributed-systems data-science spark hadoop data-deduplication data-engineering data-cleaning deduplication

Updated Aug 19, 2020
Scala

Improve this page

Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data-engineering

Here are 62 public repositories matching this topic...

metarank / metarank

feathr-ai / feathr

swoop-inc / spark-alchemy

SETL-Framework / setl

starlake-ai / starlake

dimajix / flowman

galliaproject / gallia-core

CoxAutomotiveDataSolutions / waimak

StabRise / spark-pdf

mattlianje / etl4s

CoxAutomotiveDataSolutions / spark-distcp

vitaliihonta / scala-ql

opensnowcat / opensnowcat-collector

opensnowcat / opensnowcat-enrich

AhmetFurkanDEMIR / Flink-Example

dataintoresults / data-brewery

innFactory / akka-lift-ml

HuemulSolutions / huemul-bigdatagovernance

JHLeeeMe / fake-data-pipeline

david-siqi-liu / sparklyclean

Improve this page

Add this topic to your repo