Skip to content

Commit

Permalink
Fix typos and descriptions of existing sections.
Browse files Browse the repository at this point in the history
  • Loading branch information
wanglijie95 committed Feb 16, 2022
1 parent a577d96 commit 633a085
Show file tree
Hide file tree
Showing 2 changed files with 42 additions and 33 deletions.
38 changes: 21 additions & 17 deletions docs/content.zh/docs/deployment/adaptive_batch_scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,39 +25,43 @@ under the License.

## Adaptive Batch Scheduler

Adaptive Batch Scheduler 是一种可以自动推导每个节点并行度的调度器。如果节点未设置并行度,调度器将根据其消耗的数据量的大小来推导其并行度。这可以带来诸多好处:
Adaptive Batch Scheduler 是一种可以自动推导每个算子并行度的批作业处理调度器。如果算子未设置并行度,调度器将根据其消费的数据量的大小来推导其并行度。这可以带来诸多好处:
- 批作业用户可以从并行度调优中解脱出来
- 根据数据量自动推导并行度可以更好地适应每天变化的数据量
- SQL作业中的节点也可以分配不同的并行性
- SQL作业中的算子也可以分配不同的并行性

### Usage
### 用法

使用 Adaptive Batch Scheduler 自动推导作业节点的并行度,需要:
使用 Adaptive Batch Scheduler 自动推导算子的并行度,需要:
- 启用 Adaptive Batch Scheduler
- 配置节点的并行度为 `-1`
- 配置算子的并行度为 `-1`

#### 启用 Adaptive Batch Scheduler
为了启用 Adaptive Batch Scheduler, 你需要将 [`jobmanager.scheduler`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`。除此之外,使用 Adaptive Batch Scheduler 时,以下配置也可以选择性配置:
- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许设置的并行度最小值
- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许设置的并行度最大值
为了启用 Adaptive Batch Scheduler, 你需要:
-[`jobmanager.scheduler`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler) 配置为 `AdpaptiveBatch`
- 由于 ["ALL-EXCHANGES-BLOCKING jobs only"](#限制), 需要将[`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 配置为 `ALL-EXCHANGES-BLOCKING`(默认值) 。

除此之外,使用 Adaptive Batch Scheduler 时,以下相关配置也可以调整:
- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): 允许自动设置的并行度最小值
- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): 允许自动设置的并行度最大值
- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): 期望每个任务处理的数据量大小
- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 节点的默认并行度
- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): source 算子的默认并行度

#### 配置节点的并行度为 `-1`
Adaptive Batch Scheduler 只会为用户未指定并行度的作业节点(并行度为 `-1`)推导并行度。 所以如果你想自动推导节点的并行度,需要进行以下配置:
#### 配置算子的并行度为 `-1`
Adaptive Batch Scheduler 只会为用户未指定并行度的算子(并行度为 `-1`)推导并行度。 所以如果你想自动推导算子的并行度,需要进行以下配置:
- 配置 `parallelism.default``-1`
- 对于 SQL 作业,需要配置 `table.exec.resource.default-parallelism``-1`
- 对于 DataStream 作业,不要在作业中通过算子的 `setParallelism()` 方法来指定并行度
- 对于 DataStream/DataSet 作业,不要在作业中通过算子的 `setParallelism()` 方法来指定并行度

### 性能调优

1. 建议使用 `Sort Shuffle` 并且设置 [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 `0`。 这会解耦并发与网络内存使用量,对于大规模作业,这降低了遇到 "Insufficient number of network buffers" 错误的可能性。
2. 不建议为 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 配置太大的值,否则会影响性能。因为这个选项会影响上游任务产出的 subpartition 的数量,过多的 subpartition 可能会影响 hash shuffle 的性能,或者由于小包影响网络传输的性能。
1. 建议使用 [Sort Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) 并且设置 [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) 为 `0`。 这会解耦并发与网络内存使用量,对于大规模作业,这样可以降低遇到 "Insufficient number of network buffers" 错误的可能性。
2. 建议将 [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism) 设置为最坏情况下预期需要的并行度。不建议配置太大的值,因为值过大可能会影响性能。这个选项会影响上游任务产出的 subpartition 的数量,过多的 subpartition 可能会影响 hash shuffle 的性能,或者由于小包影响网络传输的性能。

### 限制

- **ALL-EDGES-BLOCKING batch jobs only**: 目前 Adaptive Batch Scheduler 只支持 ALL-EDGES-BLOCKING 的批作业
- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch Scheduler 时,对于 broadcast 边,上游节点发送的数据量和下游节点接收的数据量可能会不相等,这在显示上会困扰用户。细节详见 [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)
- **Batch jobs only**: Adaptive Batch Scheduler 只支持批作业.
- **ALL-EXCHANGES-BLOCKING jobs only**: 目前 Adaptive Batch Scheduler 只支持 [shuffle mode]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) 为 ALL-EXCHANGES-BLOCKING 的作业
- **Inconsistent broadcast results metrics on WebUI**: 在使用 Adaptive Batch Scheduler 时,对于 broadcast 边,上游算子发送的数据量和下游算子接收的数据量可能会不相等,这在 Web UI 的显示上可能会困扰用户。细节详见 [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler)


{{< top >}}
37 changes: 21 additions & 16 deletions docs/content/docs/deployment/adaptive_batch_scheduler.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,39 +25,44 @@ under the License.

## Adaptive Batch Scheduler

The Adaptive Batch Scheduler can automatically decide parallelisms of job vertices for batch jobs. If a job vertex is not set with a parallelism, the scheduler will decide parallelism for the job vertex according to the size of its consumed datasets. This can bring many benefits:
The Adaptive Batch Scheduler can automatically decide parallelisms of operators for batch jobs. If an operator is not set with a parallelism, the scheduler will decide parallelism for it according to the size of its consumed datasets. This can bring many benefits:
- Batch job users can be relieved from parallelism tuning
- Automatically tuned parallelisms can be vertex level and can better fit consumed datasets which have a varying volume size every day
- Vertices from SQL batch jobs can be assigned with different parallelisms which are automatically tuned
- Automatically tuned parallelisms can better fit consumed datasets which have a varying volume size every day
- Operators from SQL batch jobs can be assigned with different parallelisms which are automatically tuned

### Usage

To automatically decide parallelisms for job vertices through Adaptive Batch Scheduler, you need to:
To automatically decide parallelisms for operators with Adaptive Batch Scheduler, you need to:
- Configure to use Adaptive Batch Scheduler.
- Set the parallelism of job vertices to `-1`.
- Set the parallelism of operators to `-1`.

#### Configure to use Adaptive Batch Scheduler
To use Adaptive Batch Scheduler, you need to set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler) to `AdpaptiveBatch`. In addition, there are several optional config options that might need adjustment when using Adaptive Batch Scheduler:
To use Adaptive Batch Scheduler, you need to:
- Set the [`jobmanager.scheduler`]({{< ref "docs/deployment/config" >}}#jobmanager-scheduler) to `AdaptiveBatch`
- Set the [`execution.batch-shuffle-mode`]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) to `ALL-EXCHANGES-BLOCKING`(default value) due to ["ALL-EXCHANGES-BLOCKING jobs only"](#Limitations).

In addition, there are several related configuration options that may need adjustment when using Adaptive Batch Scheduler:
- [`jobmanager.adaptive-batch-scheduler.min-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-min-parallelism): The lower bound of allowed parallelism to set adaptively
- [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism): The upper bound of allowed parallelism to set adaptively
- [`jobmanager.adaptive-batch-scheduler.data-volume-per-task`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-data-volume-per-task): The size of data volume to expect each task instance to process
- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of source vertices
- [`jobmanager.adaptive-batch-scheduler.default-source-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-default-source-parallelism): The default parallelism of data source.

#### Set the parallelism of job vertices to `-1`
Adaptive Batch Scheduler will only decide parallelism for job vertices whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of vertices can be decided automatically, you should configure as follows:
- Set `paralleims.default` to `-1`
- Set `table.exec.resource.default-parallelism` to -1 in SQL jobs.
- Don't call `setParallelism()` for operators in datastream jobs.
#### Set the parallelism of operators to `-1`
Adaptive Batch Scheduler will only decide parallelism for operators whose parallelism is not specified by users (parallelism is `-1`). So if you want the parallelism of operators to be decided automatically, you should configure as follows:
- Set [`parallelism.default`]({{< ref "docs/deployment/config" >}}#parallelism-default) to `-1`
- Set [`table.exec.resource.default-parallelism`]({{< ref "docs/deployment/config" >}}#table-exec-resource-default-parallelism) to `-1` in SQL jobs.
- Don't call `setParallelism()` for operators in DataStream/DataSet jobs.

### Performance tuning

1. It's recommended to use `Sort Shuffle` and set [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to `0`. This can decouple the network memory consumption from parallelism, so for large scale jobs, the possibility of "Insufficient number of network buffers" error can be decreased.
2. It's not recommended to configure an excessive value for [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism), otherwise it will affect the performance. Because this option can affect the number of subpartitions produced by upstream tasks, excessive number of subpartitions may degrade the performance of hash shuffle and the performance of network transmission due to small packets.
1. It's recommended to use [Sort Shuffle](https://flink.apache.org/2021/10/26/sort-shuffle-part1.html) and set [`taskmanager.network.memory.buffers-per-channel`]({{< ref "docs/deployment/config" >}}#taskmanager-network-memory-buffers-per-channel) to `0`. This can decouple the required network memory from parallelism, so that for large scale jobs, the "Insufficient number of network buffers" errors are less likely to happen.
2. It's recommended to set [`jobmanager.adaptive-batch-scheduler.max-parallelism`]({{< ref "docs/deployment/config" >}}#jobmanager-adaptive-batch-scheduler-max-parallelism) to the parallelism you expect to need in the worst case. Values larger than this are not recommended, because excessive value may affect the performance. This option can affect the number of subpartitions produced by upstream tasks, large number of subpartitions may degrade the performance of hash shuffle and the performance of network transmission due to small packets.

### Limitations

- **ALL-EDGES-BLOCKING batch jobs only**: The first version of Adaptive Batch Scheduler only supports ALL-EDGES-BLOCKING batch jobs only.
- **Inconsistent broadcast results metrics on WebUI**: In Adaptive Batch Scheduler, for broadcast results, the number of bytes/records sent by the upstream vertex counted by metric is not equal to the number of bytes/records received by the downstream vertex, which may confuse users when displayed on the Web UI. See [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler) for details.
- **Batch jobs only**: Adaptive Batch Scheduler only supports batch jobs.
- **ALL-EXCHANGES-BLOCKING jobs only**: At the moment, Adaptive Batch Scheduler only supports jobs whose [shuffle mode]({{< ref "docs/deployment/config" >}}#execution-batch-shuffle-mode) is `ALL-EXCHANGES-BLOCKING`(Upstream and downstream tasks run sequentially in such jobs).
- **Inconsistent broadcast results metrics on WebUI**: In Adaptive Batch Scheduler, for broadcast results, the number of bytes/records sent by the upstream task counted by metric is not equal to the number of bytes/records received by the downstream task, which may confuse users when displayed on the Web UI. See [FLIP-187](https://cwiki.apache.org/confluence/display/FLINK/FLIP-187%3A+Adaptive+Batch+Job+Scheduler) for details.


{{< top >}}

0 comments on commit 633a085

Please sign in to comment.