Skip to content

Commit

Permalink
[SPARK-46095][DOCS] Document REST API for Spark Standalone Cluster
Browse files Browse the repository at this point in the history
This PR aims to document `REST API` for Spark Standalone Cluster.

To help the users to understand Apache Spark features.

No.

Manual review. `REST API` Section is added newly.

**AFTER**

<img width="704" alt="Screenshot 2023-11-24 at 4 13 53 PM" src="https://github.com/apache/spark/assets/9700541/a4e09d94-d216-4629-8b37-9d350365a428">

No.

Closes apache#44007 from dongjoon-hyun/SPARK-46095.

Authored-by: Dongjoon Hyun <dhyun@apple.com>
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
(cherry picked from commit 132c1a1)
Signed-off-by: Dongjoon Hyun <dhyun@apple.com>
  • Loading branch information
dongjoon-hyun committed Nov 25, 2023
1 parent a53c16a commit 03dac18
Showing 1 changed file with 68 additions and 0 deletions.
68 changes: 68 additions & 0 deletions docs/spark-standalone.md
Original file line number Diff line number Diff line change
Expand Up @@ -444,6 +444,8 @@ Spark applications supports the following configuration properties specific to s

# Launching Spark Applications

## Spark Protocol

The [`spark-submit` script](submitting-applications.html) provides the most straightforward way to
submit a compiled Spark application to the cluster. For standalone clusters, Spark currently
supports two deploy modes. In `client` mode, the driver is launched in the same process as the
Expand All @@ -466,6 +468,72 @@ failing repeatedly, you may do so through:

You can find the driver ID through the standalone Master web UI at `http://<master url>:8080`.

## REST API

If `spark.master.rest.enabled` is enabled, Spark master provides additional REST API
via <code>http://[host:port]/[version]/submissions/[action]</code> where
<code>host</code> is the master host, and
<code>port</code> is the port number specified by `spark.master.rest.port` (default: 6066), and
<code>version</code> is a protocol version, <code>v1</code> as of today, and
<code>action</code> is one of the following supported actions.

<table class="table table-striped">
<thead><tr><th style="width:21%">Command</th><th>Description</th><th>HTTP METHOD</th><th>Since Version</th></tr></thead>
<tr>
<td><code>create</code></td>
<td>Create a Spark driver via <code>cluster</code> mode.</td>
<td>POST</td>
<td>1.3.0</td>
</tr>
<tr>
<td><code>kill</code></td>
<td>Kill a single Spark driver.</td>
<td>POST</td>
<td>1.3.0</td>
</tr>
<tr>
<td><code>status</code></td>
<td>Check the status of a Spark job.</td>
<td>GET</td>
<td>1.3.0</td>
</tr>
</table>

The following is a <code>curl</code> CLI command example with the `pi.py` and REST API.

```bash
$ curl -XPOST http://IP:PORT/v1/submissions/create \
--header "Content-Type:application/json;charset=UTF-8" \
--data '{
"appResource": "",
"sparkProperties": {
"spark.master": "spark://master:7077",
"spark.app.name": "Spark Pi",
"spark.driver.memory": "1g",
"spark.driver.cores": "1",
"spark.jars": ""
},
"clientSparkVersion": "",
"mainClass": "org.apache.spark.deploy.SparkSubmit",
"environmentVariables": { },
"action": "CreateSubmissionRequest",
"appArgs": [ "/opt/spark/examples/src/main/python/pi.py", "10" ]
}'
```

The following is the response from the REST API for the above <code>create</code> request.

```bash
{
"action" : "CreateSubmissionResponse",
"message" : "Driver successfully submitted as driver-20231124153531-0000",
"serverSparkVersion" : "3.4.2",
"submissionId" : "driver-20231124153531-0000",
"success" : true
}
```


# Resource Scheduling

The standalone cluster mode currently only supports a simple FIFO scheduler across applications.
Expand Down

0 comments on commit 03dac18

Please sign in to comment.