#

Apache Spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 1,718 public repositories matching this topic...

tobymao / sqlglot

Python SQL Parser and Transpiler

Updated Nov 2, 2024
Python

marcoaureliomenezes / case_web_logs_analytics

Case Santander

python airflow spark iceberg dremio nessie

Updated Nov 2, 2024
Python

mage-ai / mage-ai

🧙 Build, run, and manage data pipelines for integrating and transforming data.

python data-science data machine-learning sql spark pipeline etl pipelines orchestration artificial-intelligence data-engineering data-integration dbt elt transformation data-pipelines reverse-etl

Updated Nov 1, 2024
Python

KarolSekscinski / realtime-streaming

The system will process financial data in the form of real-time streaming data.

streaming real-time kafka spark cassandra grafana

Updated Nov 1, 2024
Python

flyteorg / flytekit

Extensible Python SDK for developing Flyte tasks and workflows. Simple to get started and learn and highly extensible.

python data-science data automation sdk spark pypi extensible workflows hacktoberfest flyte mlops flyte-tasks

Updated Nov 2, 2024
Python

listenbrainz-server

metabrainz / listenbrainz-server

Server for the ListenBrainz project, including the front-end (javascript/react) code that it serves and all of the data processing components that LB uses.

react python music typescript database web big-data spark listenbrainz-server

Updated Nov 1, 2024
Python

euiyounghwang / Prometheus-monitoring-exporter

Prometheus-monitoring-exporter

elasticsearch spark jupyter-notebook prometheus python3 prometheus-exporter kafka-connect grafana-dashboard prometheus-client-library prometheus-client prometheus-metrics apache-airflow grafana-loki streamlit streamlit-webapp python-operators poetry-python grafana-promtail

Updated Nov 1, 2024
Python

jazzwang / snippet

some personal code snippet to learn new programming skill

javascript java scala spark gradle

Updated Nov 1, 2024
Python

SuperCowPowers / sageworks

SageWorks: An easy to use Python API for creating and deploying AWS SageMaker Models

python aws machine-learning big-data spark pandas data-engineering

Updated Nov 1, 2024
Python

mrpowers-io / tsumugi-spark

SparkConnect Server plugin and protobuf messages for the Amazon Deequ Data Quality Engine.

spark pyspark data-quality deequ

Updated Nov 1, 2024
Python

jaehyeon-kim / general-demos

Data engineering demo projects

aws kafka spark dbt opensearch dataengineering serverlessapplicationmodel kafkaconnect

Updated Nov 1, 2024
Python

getredash / redash

Make Your Company Data Driven. Connect to any data source, easily visualize, dashboard and share your data.

visualization javascript mysql python bigquery bi spark dashboard athena analytics postgresql business-intelligence redash redshift databricks hacktoberfest spark-sql

Updated Nov 1, 2024
Python

marcoaureliomenezes / case_ab_inbev

Case web-server-logs-analytics built using stack MinIO, Iceberg, Nessie, Dremio, Spark Airflow and Python and my library rand-engine as data generator.

airflow spark iceberg dremio nessie

Updated Oct 31, 2024
Python

JohnSnowLabs / johnsnowlabs

Gateway into the John Snow Labs Ecosystem

python nlp machine-learning natural-language-processing spark seq2seq gpt databricks bert t5

Updated Oct 31, 2024
Python

frizzleqq / pyspark-deltalake

Example of local pyspark setup including DeltaLake for unit-testing

spark pytest pyspark delta-lake

Updated Oct 31, 2024
Python

apache / paimon-python

Apache Paimon Python The Python implementation of Apache Paimon.

big-data spark flink real-time-analytics data-ingestion table-store paimon streaming-datalake

Updated Oct 31, 2024
Python

onetl

MobileTeleSystems / onetl

One ETL tool to rule them all

spark etl etl-pipeline etl-components hwm

Updated Oct 31, 2024
Python

capitalone / datacompy

Pandas, Polars, Spark, and Snowpark DataFrame comparison for humans and more!

python data-science data spark numpy snowflake pandas pyspark compare dask dataframes fugue snowpark polars

Updated Oct 30, 2024
Python

f-lab-edu / league-of-legends-data-solution

‘리그 오브 레전드’를 벤치마킹해서 플레이어의 행동 이벤트를 발생하는 API를 통해 실시간으로 데이터가 잘 흐를 수 있도록 데이터 솔루션을 제공합니다.

airflow spark dataengineering

Updated Oct 30, 2024
Python

moj-analytical-services / splink

Fast, accurate and scalable probabilistic data linkage with support for multiple SQL backends

data-science spark record-linkage entity-resolution fuzzy-matching deduplication em-algorithm data-matching deduplicate-data duckdb uk-gov-data-science

Updated Oct 30, 2024
Python

Created by Matei Zaharia

Released May 26, 2014

Followers: 422 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics