A spark metrics sink that pushes to InfluxDb
Collecting diagnostic metrics from Apache Spark can be difficult because of the distributed nature of Spark. Polling Spark executor processes or scraping logs becomes tedious when executors run on an arbitrary number of remote hosts. This package instead uses a "push" method of sending metrics to a central host running InfluxDb, where they can be centrally analyzed.
- Run
./gradlew build
- Copy the JAR that is output to a path where Spark can read it, and add it to Spark's
extraClassPath
, along with izettle/metrics-influxdb (available on maven) - Add your new sink to Spark's
conf/metrics.properties
Example metrics.properties snippet:
*.sink.influx.class=org.apache.spark.metrics.sink.InfluxDbSink
*.sink.influx.protocol=https
*.sink.influx.host=localhost
*.sink.influx.port=8086
*.sink.influx.database=my_metrics
*.sink.influx.auth=metric_client:PASSWORD
*.sink.influx.tags=product:my_product,parent:my_service
- This takes a dependency on the Apache2-licensed
com.izettle.dropwizard-metrics-influxdb
library, which is an improved version of Dropwizard's upstream InfluxDb support, which exists only in the DropWizard Metrics 4.0 branch. - The package that this code lives in is
org.apache.spark.metrics.sink
, which is necessary because Spark makes its Sink interface package-private.
This project is made available under the Apache 2.0 License.