Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
-
Updated
May 29, 2024 - Scala
Byzer (former MLSQL): A low-code open-source programming language for data pipeline, analytics and AI.
Apache Livy is an open source REST interface for interacting with Apache Spark from anywhere.
Lightweight real-time big data streaming engine over Akka
Read and write Parquet in Scala. Use Scala classes as schema. No need to start a cluster.
Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
C4E, a JVM friendly library written in Scala for both local and distributed (Spark) Clustering.
Splash, a flexible Spark shuffle manager that supports user-defined storage backends for shuffle data storage and exchange
A free, open-source, web-based self-service BI tailor-made for clickhouse, google bigquery, mysql, postgresql, vertica
Flowman is an ETL framework powered by Apache Spark. With its declarative approach, Flowman simplifies the development of complex data pipelines.
Apache Spark Course Material
type-class based data cleansing library for Apache Spark SQL
Apache Spark 3 - Structured Streaming Course Material
Write ETL using your favorite SQL dialects
大数据框架 Spark MLlib 机器学习库基础算法全面讲解,附带齐全的测试文件
Add a description, image, and links to the bigdata topic page so that developers can more easily learn about it.
To associate your repository with the bigdata topic, visit your repo's landing page and select "manage topics."