Skip to content

Commit

Permalink
Update documentation of PushGatewaySink
Browse files Browse the repository at this point in the history
  • Loading branch information
LucaCanali committed Jun 6, 2024
1 parent 6f712cf commit 4d61265
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 11 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -263,7 +263,7 @@ SparkMeasure is one tool for many different use cases, languages, and environmen
SparkMeasure in flight recorder will collect metrics transparently, without any need for you
to change your code.
* Metrics can be saved to a file, locally, or to a Hadoop-compliant filesystem
* or you can write metrics in near-realtime to the followingsinks: InfluxDB, Apache Kafka, Prometheus PushPushgateway
* or you can write metrics in near-realtime to the following sinks: InfluxDB, Apache Kafka, Prometheus PushPushgateway
* More details:
- **[Flight Recorder mode with file sink](docs/Flight_recorder_mode_FileSink.md)**
- **[Flight Recorder mode with InfluxDB sink](docs/Flight_recorder_mode_InfluxDBSink.md)**
Expand Down
25 changes: 15 additions & 10 deletions docs/Flight_recorder_mode_PrometheusPushgatewaySink.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,24 @@ This describes how to sink Spark metrics to a Prometheus Gateway.

## PushGatewaySink

**PushGatewaySink** is a class that extends the SparkListener infrastructure.
It collects and writes Spark metrics and application info in near real-time to a Prometheus Gateway instance.
provided by the user. Use this mode to monitor Spark execution workload.
Notes, the amount of data generated is relatively small in most applications: O(number_of_stages)
**PushGatewaySink** is a class that extends the Spark listener infrastructure to collect metrics and
write them to a Prometheus Gateway endpoint. PushGatewaySink collects and writes Spark metrics
and application info in near real-time to a Prometheus Gateway instance provided by the user.
Use this mode to monitor Spark execution workload.
Notes:
- Currently, PushGatewaySink collects data at the Stage level (StageMetrics).
Task-level metrics are not collected.
- The amount of data generated is relatively small in most applications; it is O(number_of_stages)

How to use: attach the PrometheusGatewaySink to a Spark Context using the listener infrastructure. Example:
- `--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink`

Configuration for the is handled with Spark configuration parameters.
Note: you can add configuration using --config option when using spark-submit
use the .config method when allocating the Spark Session in Scala/Python/Java).
Configurations:
```
Configuration for PushGatewaySink is handled using Spark configuration parameters.
Note: you can add configuration using `--config` options when using spark-submit
or use the `.config` method when allocating the Spark Session in Scala/Python/Java), as usual.

**Configurations:**
```
Option 1 (recommended) Start the listener for PushGatewaySink:
--conf spark.extraListeners=ch.cern.sparkmeasure.PushGatewaySink
Expand All @@ -32,7 +37,7 @@ Configuration - PushGatewaySink parameters:

## Use case

- The use case for this sink it to extend Spark monitoring, by writing execution metrics into Prometheus via the Pushgateway,
- The use case for PushGatewaySink is to extend Spark monitoring, by writing execution metrics into Prometheus via the Pushgateway,
as Prometheus has a pull-based architecture. You'll need to configure Prometheus to pull metrics from the Pushgateway.
You'll also need to set up a performance dashboard from the metrics collected by Prometheus.

Expand Down

0 comments on commit 4d61265

Please sign in to comment.