A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine
-
Updated
Jun 27, 2025 - Scala
A low code Machine Learning personalized ranking service for articles, listings, search results, recommendations that boosts user engagement. A friendly Learn-to-Rank engine
Feathr – A scalable, unified data and AI engineering platform for enterprise
Collection of open-source Spark tools & frameworks that have made the data engineering and data science teams at Swoop highly productive
A simple Spark-powered ETL framework that just works 🍺
Declarative text based tool for data analysts and engineers to extract, load, transform and orchestrate their data pipelines.
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
A schema-aware Scala library for data transformation
Waimak is an open-source framework that makes it easier to create complex data flows in Apache Spark.
PDF DataSource for Apache Spark, allow to read PDF files directly to the DataFrame and ocr it
Powerful, whiteboard-style ETL
A re-implementation of Hadoop DistCP in Apache Spark
Data manipulation and reporting for Scala.
OpenSnowcat Collector, an open source fork of Snowplow (Apache 2.0 License)
OpenSnowcat Enricher (Apache 2.0 License)
Flink Example
Data Brewery is an ETL (Extract-Transform-Load) program that connect to many data sources (cloud services, databases, ...) and manage data warehouse workflow.
akka http service for serving spark machine learning models
Huemul BigDataGovernance, es una framework que trabaja sobre Spark, Hive y HDFS. Permite la implementación de una estrategia corporativa de dato único, basada en buenas prácticas de Gobierno de Datos. Permite implementar tablas con control de Primary Key y Foreing Key al insertar y actualizar datos utilizando la librería, Validación de nulos, la…
Data Generators -> Kafka -> Spark Streaming -> PostgreSQL -> Grafana
Optimal distributed data deduplication and supervised learning pipeline using Apache Spark
Add a description, image, and links to the data-engineering topic page so that developers can more easily learn about it.
To associate your repository with the data-engineering topic, visit your repo's landing page and select "manage topics."