Apache Spark
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Here are 1,718 public repositories matching this topic...
🧙 Build, run, and manage data pipelines for integrating and transforming data.
-
Updated
Nov 1, 2024 - Python
Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.
-
Updated
Nov 2, 2024 - Python
Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.
-
Updated
Nov 1, 2024 - Python
Prometheus-monitoring-exporter
-
Updated
Nov 1, 2024 - Python
some personal code snippet to learn new programming skill
-
Updated
Nov 1, 2024 - Python
SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models
-
Updated
Nov 1, 2024 - Python
SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.
-
Updated
Nov 1, 2024 - Python
Data engineering demo projects
-
Updated
Nov 1, 2024 - Python
Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.
-
Updated
Nov 1, 2024 - Python
Gateway into the John Snow Labs Ecosystem
-
Updated
Oct 31, 2024 - Python
Example of local pyspark setup including DeltaLake for unit-testing
-
Updated
Oct 31, 2024 - Python
Apache Paimon Python The Python implementation of Apache Paimon.
-
Updated
Oct 31, 2024 - Python
‘리그 오브 레전드’를 벤치마킹해서 플레이어의 행동 이벤트를 발생하는 API를 통해 실시간으로 데이터가 잘 흐를 수 있도록 데이터 솔루션을 제공합니다.
-
Updated
Oct 30, 2024 - Python
Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends
-
Updated
Oct 30, 2024 - Python
Created by Matei Zaharia
Released May 26, 2014
- Followers
- 422 followers
- Repository
- apache/spark
- Website
- spark.apache.org
- Wikipedia
- Wikipedia