#

apache-spark

Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.

Here are 229 public repositories matching this topic...

OryxProject / oryx

Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning

java machine-learning kafka apache-spark cloudera apache-kafka lambda-architecture oryx

Updated Aug 16, 2021
Java

mahmoudparsian / data-algorithms-book

MapReduce, Spark, Java, and Scala for Data Algorithms Book

python java machine-learning scala apache-spark distributed-computing design-patterns pyspark mapreduce reducers partitioning hadoop-mapreduce distributed-algorithms mappers data-algorithms apache-hadoop

Updated Oct 14, 2024
Java

openscoring / openscoring

REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models

api real-time r apache-spark scikit-learn xgboost lightgbm pmml

Updated Sep 3, 2024
Java

Mellanox / SparkRDMA

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx

java scala big-data spark apache-spark hadoop bigdata rdma infiniband roce shuffle mellanox disni

Updated May 13, 2019
Java

whylabs / whylogs-java

Profile and monitor your ML data pipeline end-to-end

java statistics spark apache-spark dataset data-quality calculate-statistics aiops mlops ai-pipelines approximate-statistics statistical-properties whylogs

Updated Sep 28, 2021
Java

radanalyticsio / spark-operator

Operator for managing the Spark clusters on Kubernetes and OpenShift.

kubernetes spark apache-spark openshift kubernetes-operator

Updated Nov 18, 2021
Java

BitwiseInc / Hydrograph

A visual ETL development and debugging tool for big data

big-data apache-spark etl cascading etl-framework

Updated Dec 5, 2022
Java

igor-suhorukov / openstreetmap_h3

OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps

java converter world apache-spark arrow openstreetmap geospatial postgresql postgis parquet column-store pbf geometry-processing parquet-files apache-arrow citusdb duckdb pbf-format apach-sedona

Updated Sep 2, 2024
Java

adrianulbona / osm-parquetizer

A converter for the OSM PBFs to Parquet files

converter apache-spark openstreetmap pbf parquet-files

Updated Sep 1, 2023
Java

exacaster / lighter

REST API for Apache Spark on K8S or YARN

spark apache-spark yarn jupyter k8s livy sparkmagic

Updated Jan 10, 2025
Java

seznam / euphoria

Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.

kafka big-data apache-spark hadoop hdfs java-api apache-flink batch-processing streaming-data unified-bigdata-processing

Updated Nov 15, 2022
Java

alipay / jpmml-sparkml-lightgbm

JPMML-SparkML plugin for converting LightGBM-Spark models to PMML

machine-learning apache-spark sparkml lightgbm pmml

Updated Oct 23, 2021
Java

mincloud1501 / BigData

빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구

elasticsearch kibana kafka apache-spark pipeline splunk grafana bigdata apache druid hdfs lucene apache-kafka zeppelin

Updated Jan 8, 2020
Java

melphi / spark-examples

Spark examples

spark apache-spark spark-java

Updated May 7, 2024
Java

flipkart-incubator / spark-transformers

Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.

java export machine-learning scala spark apache-spark machine-learning-algorithms transformers mllib machine-learning-library data-pipelines

Updated Dec 15, 2017
Java

raycad / stream-processing

Stream processing guidelines and examples using Apache Flink and Apache Spark

streaming apache-spark data-analysis apache-flink batch-processing

Updated Apr 21, 2023
Java

qwshen / spark-flight-connector

A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL

sql apache-spark arrow dremio apache-arrow arrow-flight flight-sql apache-flight spark-connector data-source-api

Updated Sep 21, 2024
Java

aamargajbhiye / big-data-projects

This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc

apache-spark spark-java apache-ignite apache-spark-cluster igfs

Updated Oct 6, 2023
Java

spoddutur / cloud-based-sql-engine-using-spark

Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.

apache-spark jdbc sparksql thrift-server sql-engine beeline hadoop-framework spark-thrift-server

Updated Jul 12, 2017
Java

GMAP / DSPBench

a suite of benchmark applications for distributed data stream processing systems

big-data apache-spark storm data-stream bigdata evaluation stream-processing spark-streaming apache-storm apache-flink experiments big-data-analytics

Updated Sep 27, 2024
Java

Created by Matei Zaharia

Released May 26, 2014

Followers: 426 followers
Repository: apache/spark
Website: spark.apache.org
Wikipedia: Wikipedia

Related Topics