Skip to content

Commit

Permalink
update document
Browse files Browse the repository at this point in the history
  • Loading branch information
liunaijie committed Jun 24, 2024
1 parent e3addf7 commit 01cb36d
Show file tree
Hide file tree
Showing 26 changed files with 105 additions and 33 deletions.
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ SeaTunnel addresses common data integration challenges:

## SeaTunnel Workflow

![SeaTunnel Workflow](docs/en/images/architecture_diagram.png)
![SeaTunnel Workflow](docs/images/architecture_diagram.png)

Configure jobs, select execution engines, and parallelize data using Source Connectors. Easily develop and extend connectors to meet your needs.

Expand Down
2 changes: 1 addition & 1 deletion docs/en/about.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ SeaTunnel focuses on data integration and data synchronization, and is mainly de

## SeaTunnel work flowchart

![SeaTunnel work flowchart](images/architecture_diagram.png)
![SeaTunnel work flowchart](../images/architecture_diagram.png)

The runtime process of SeaTunnel is shown in the figure above.

Expand Down
6 changes: 3 additions & 3 deletions docs/en/faq.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,9 +65,9 @@ Refer to: [lightbend/config#456](https://github.com/lightbend/config/issues/456)

Of course! See the screenshot below:

![workflow.png](images/workflow.png)
![workflow.png](../images/workflow.png)

![azkaban.png](images/azkaban.png)
![azkaban.png](../images/azkaban.png)

## Does SeaTunnel have a case for configuring multiple sources, such as configuring elasticsearch and hdfs in source at the same time?

Expand Down Expand Up @@ -184,7 +184,7 @@ The following conclusions can be drawn:

3. In general, both M and N are determined, and the conclusion can be drawn from 2: The size of `spark.streaming.kafka.maxRatePerPartition` is positively correlated with the size of `spark.executor.cores` * `spark.executor.instances`, and it can be increased while increasing the resource `maxRatePerPartition` to speed up consumption.

![kafka](images/kafka.png)
![kafka](../images/kafka.png)

## How can I solve the Error `Exception in thread "main" java.lang.NoSuchFieldError: INSTANCE`?

Expand Down
20 changes: 11 additions & 9 deletions docs/en/seatunnel-engine/resource-manager.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,8 +3,11 @@
sidebar_position: 9
-------------------

In version 2.3.6. SeaTunnel can add tag to worker node, when you submit job you can specify the tag you want to run.
update the config in `hazelcast.yaml`,
After version 2.3.6. SeaTunnel can add `tag` to each worker node, when you submit job you can use `tag_filter` to filter the node you want run this job.

# How to archive this:

1. update the config in `hazelcast.yaml`,

```yaml
hazelcast:
Expand Down Expand Up @@ -40,13 +43,14 @@ hazelcast:
```

In this config, we specify the tag by `member-attributes`, the node has `group=platform, team=team1` tags.
Then, when we use this job config to submit job, we can assign the task to this node.

2. add `tag_filter` to your job config

```hacon
env {
parallelism = 1
job.mode = "BATCH"
tag {
tag_filter {
group = "platform"
team = "team1"
}
Expand All @@ -72,10 +76,8 @@ sink {
```

**Notice:**
- If not set this tag in config, it will choose the node in all active nodes.
- In you input a not exist tag, like `group=platform, team=team2`, you will get `NoEnoughResourceException` exception.
- if you special multiple tag, it needs all tag exist and value match, you can add multiple tags to node, but only use few tag to filter node.
like only use `group=platform`
- If not set `tag_filter` in job config, it will random choose the node in all active nodes.
- When you add multiple tag in `tag_filter`, it need all key exist and value match. if all node not match, you will get `NoEnoughResourceException` exception.

![img.png](resource_tag.png)
![img.png](../../images/resource_tag.png)

Binary file removed docs/en/seatunnel-engine/resource_tag.png
Binary file not shown.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file added docs/images/resource_tag.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
3 changes: 2 additions & 1 deletion docs/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,8 @@ const sidebars = {
"seatunnel-engine/checkpoint-storage",
"seatunnel-engine/rest-api",
"seatunnel-engine/tcp",
"seatunnel-engine/engine-jar-storage-mode"
"seatunnel-engine/engine-jar-storage-mode",
"seatunnel-engine/resource-manager",
]
},
{
Expand Down
Binary file removed docs/zh/images/architecture_diagram.png
Binary file not shown.
Binary file removed docs/zh/images/azkaban.png
Binary file not shown.
Binary file removed docs/zh/images/checkstyle.png
Binary file not shown.
Binary file removed docs/zh/images/kafka.png
Binary file not shown.
4 changes: 0 additions & 4 deletions docs/zh/images/seatunnel-workflow.svg

This file was deleted.

Binary file removed docs/zh/images/seatunnel_architecture.png
Binary file not shown.
Binary file removed docs/zh/images/seatunnel_starter.png
Binary file not shown.
Binary file removed docs/zh/images/workflow.png
Binary file not shown.
83 changes: 83 additions & 0 deletions docs/zh/seatunnel-engine/resource-manager.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
---

sidebar_position: 9
-------------------

在2.3.6版本之后, SeaTunnel支持对每个实例添加`tag`, 然后在提交任务时可以在配置文件中使用`tag_filter`来选择任务将要运行的节点.

# 如何实现改功能

1. 更新`hazelcast.yaml`文件

```yaml
hazelcast:
cluster-name: seatunnel
network:
rest-api:
enabled: true
endpoint-groups:
CLUSTER_WRITE:
enabled: true
DATA:
enabled: true
join:
tcp-ip:
enabled: true
member-list:
- localhost
port:
auto-increment: false
port: 5801
properties:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
member-attributes:
group:
type: string
value: platform
team:
type: string
value: team1
```

在这个配置中, 我们通过`member-attributes`设置了`group=platform, team=team1`这样两个`tag`

2. 在任务的配置中添加`tag_filter`来选择你需要运行该任务的节点

```hacon
env {
parallelism = 1
job.mode = "BATCH"
tag_filter {
group = "platform"
team = "team1"
}
}
source {
FakeSource {
result_table_name = "fake"
parallelism = 1
schema = {
fields {
name = "string"
}
}
}
}
transform {
}
sink {
console {
source_table_name="fake"
}
}
```

**注意:**
- 当在任务的配置中, 没有添加`tag_filter`时, 会从所有节点中随机选择节点来运行任务.
- 当`tag_filter`中存在多个过滤条件时, 会根据key存在以及value相等的全部匹配的节点, 当没有找到匹配的节点时, 会抛出 `NoEnoughResourceException`异常.

![img.png](../../images/resource_tag.png)

Original file line number Diff line number Diff line change
Expand Up @@ -38,11 +38,4 @@ hazelcast:
hazelcast.invocation.max.retry.count: 20
hazelcast.tcp.join.port.try.count: 30
hazelcast.logging.type: log4j2
hazelcast.operation.generic.thread.count: 50
member-attributes:
group:
type: string
value: platform
team:
type: string
value: team1
hazelcast.operation.generic.thread.count: 50
Original file line number Diff line number Diff line change
Expand Up @@ -150,14 +150,11 @@ public CompletableFuture<List<SlotProfile>> applyResources(
boolean match = true;
for (Map.Entry<String, String> entry :
tagFilter.entrySet()) {
if (workerAttr.containsKey(entry.getKey())
&& workerAttr
if (!workerAttr.containsKey(entry.getKey())
|| !workerAttr
.get(entry.getKey())
.equals(entry.getValue())) {
// need all tag match
} else {
match = false;
break;
return false;
}
}
return match;
Expand Down

0 comments on commit 01cb36d

Please sign in to comment.