Chronix Spark

An Apache Spark RDD implementation for time series processing - based on Chronix.

Usage Guide

A ChronixRDD is a collection of univariate time series. Each of them has its own vector of timestamps - they are not aligned on one common vector of timestamps.
Time series are multi-dimensional. Each time series is associated to one or more dimensions. The identity of a time series is the combination of some of its dimension values.
ChronixRDD has its own storage engine based on Solr Cloud and the Chronix format. So the time series data is stored storage-efficient, sharded and with equipped with low-level queries to perform predicate pushdown.

How does Chronix Spark compare to Spark-TS?

Spark-TS provides no specific time series storage it uses the Spark persistence mechanisms instead. This leads to a less efficient storage usage and less possibilities to perform performance optimizations via predicate pushdown.
In contrast to Spark-TS Chronix does not align all time series values on one vector of timestamps. This leads to greater flexibility in time series aggregation.
Chronix provides multi-dimensional time series as this is very useful for data warehousing and APM.
Chronix has support for Datasets as this will be an important Spark API in the near future. But Chronix currently doesn’t support an IndexedRowMatrix for SparkML.
Chronix is purely written in Java. There is no explicit support for Python and Scala yet.
Chronix doesn not support a ZonedTime as this makes it way more complicated.

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
chronix-infrastructure-local		chronix-infrastructure-local
chronix-solr-cloud-storage		chronix-solr-cloud-storage
chronix-spark-ts-rdd		chronix-spark-ts-rdd
gradle/wrapper		gradle/wrapper
solr-cloud		solr-cloud
.gitignore		.gitignore
.travis.yml		.travis.yml
LICENSE		LICENSE
README.adoc		README.adoc
build.gradle		build.gradle
gradlew		gradlew
gradlew.bat		gradlew.bat
logo.png		logo.png
settings.gradle		settings.gradle