Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
-
Updated
Aug 16, 2021 - Java
Apache Spark is an open source distributed general-purpose cluster-computing framework. It provides an interface for programming entire clusters with implicit data parallelism and fault tolerance.
Oryx 2: Lambda architecture on Apache Spark, Apache Kafka for real-time large scale machine learning
MapReduce, Spark, Java, and Scala for Data Algorithms Book
REST web service for the true real-time scoring (<1 ms) of Scikit-Learn, R and Apache Spark models
This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Profile and monitor your ML data pipeline end-to-end
Operator for managing the Spark clusters on Kubernetes and OpenShift.
A visual ETL development and debugging tool for big data
OSM planet dump high performance data loader. Transform OpenStreetMap World/Region PBF dump into partitioned by H3 regions PostGIS pgsnapshot (lossless) OSM schema representation and/or into ArrowIPC/Parquet dumps
A converter for the OSM PBFs to Parquet files
REST API for Apache Spark on K8S or YARN
Euphoria is an open source Java API for creating unified big-data processing flows. It provides an engine independent programming model which can express both batch and stream transformations.
JPMML-SparkML plugin for converting LightGBM-Spark models to PMML
빅데이터 pipeline 구성 요소 기술들에 관한 coding 실습 및 연구
Spark-Transformers: Library for exporting Apache Spark MLLIB models to use them in any Java application with no other dependencies.
Stream processing guidelines and examples using Apache Flink and Apache Spark
A Spark Connector that reads data from / writes data to Arrow-Flight end-points with Arrow-Flight and Flight-SQL
This project has customization likes custom data sources, plugins written for the distributed systems like Apache Spark, Apache Ignite etc
Cloud-based SQL engine using SPARK where data is accessible as JDBC/ODBC data source via Spark ThriftServer.
a suite of benchmark applications for distributed data stream processing systems
Created by Matei Zaharia
Released May 26, 2014