forked from apache/seatunnel
-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[hotfix][connector-v2-hbase]fix and optimize hbase source problem (ap…
…ache#7148) * [hotfix][improve][doc]optimize connector hbase source * [doc]add dependent document * [doc]update dependent document * [improve]improve static use * [hotfix]add test case * [hotfix]add test case --------- Co-authored-by: Jia Fan <fanjiaeminem@qq.com>
- Loading branch information
1 parent
9df557c
commit 34a6b8e
Showing
11 changed files
with
455 additions
and
90 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
# Hbase | ||
|
||
> Hbase 源连接器 | ||
## 描述 | ||
|
||
从 Apache Hbase 读取数据。 | ||
|
||
## 主要功能 | ||
|
||
- [x] [批处理](../../concept/connector-v2-features.md) | ||
- [ ] [流处理](../../concept/connector-v2-features.md) | ||
- [ ] [精确一次](../../concept/connector-v2-features.md) | ||
- [x] [Schema](../../concept/connector-v2-features.md) | ||
- [x] [并行度](../../concept/connector-v2-features.md) | ||
- [ ] [支持用户定义的拆分](../../concept/connector-v2-features.md) | ||
|
||
## 选项 | ||
|
||
| 名称 | 类型 | 必填 | 默认值 | | ||
|--------------------|---------|----|-------| | ||
| zookeeper_quorum | string | 是 | - | | ||
| table | string | 是 | - | | ||
| schema | config | 是 | - | | ||
| hbase_extra_config | string | 否 | - | | ||
| caching | int | 否 | -1 | | ||
| batch | int | 否 | -1 | | ||
| cache_blocks | boolean | 否 | false | | ||
| common-options | | 否 | - | | ||
|
||
### zookeeper_quorum [string] | ||
|
||
hbase的zookeeper集群主机,例如:“hadoop001:2181,hadoop002:2181,hadoop003:2181” | ||
|
||
### table [string] | ||
|
||
要写入的表名,例如:“seatunnel” | ||
|
||
### schema [config] | ||
|
||
Hbase 使用字节数组进行存储。因此,您需要为表中的每一列配置数据类型。有关更多信息,请参阅:[guide](../../concept/schema-feature.md#how-to-declare-type-supported)。 | ||
|
||
### hbase_extra_config [config] | ||
|
||
hbase 的额外配置 | ||
|
||
### caching | ||
|
||
caching 参数用于设置在扫描过程中一次从服务器端获取的行数。这可以减少客户端与服务器之间的往返次数,从而提高扫描效率。默认值:-1 | ||
|
||
### batch | ||
|
||
batch 参数用于设置在扫描过程中每次返回的最大列数。这对于处理有很多列的行特别有用,可以避免一次性返回过多数据,从而节省内存并提高性能。 | ||
|
||
### cache_blocks | ||
|
||
cache_blocks 参数用于设置在扫描过程中是否缓存数据块。默认情况下,HBase 会在扫描时将数据块缓存到块缓存中。如果设置为 false,则在扫描过程中不会缓存数据块,从而减少内存的使用。在SeaTunnel中默认值为: false | ||
|
||
### 常用选项 | ||
|
||
Source 插件常用参数,具体请参考 [Source 常用选项](common-options.md) | ||
|
||
## 示例 | ||
|
||
```bash | ||
source { | ||
Hbase { | ||
zookeeper_quorum = "hadoop001:2181,hadoop002:2181,hadoop003:2181" | ||
table = "seatunnel_test" | ||
caching = 1000 | ||
batch = 100 | ||
cache_blocks = false | ||
schema = { | ||
columns = [ | ||
{ | ||
name = "rowkey" | ||
type = string | ||
}, | ||
{ | ||
name = "columnFamily1:column1" | ||
type = boolean | ||
}, | ||
{ | ||
name = "columnFamily1:column2" | ||
type = double | ||
}, | ||
{ | ||
name = "columnFamily2:column1" | ||
type = bigint | ||
} | ||
] | ||
} | ||
} | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
# Source Common Options | ||
|
||
> Source connector 的常用参数 | ||
| 名称 | 类型 | 必填 | 默认值 | 描述 | | ||
|-------------------|--------|----|-----|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| | ||
| result_table_name | String | 否 | - | 当未指定 `result_table_name` 时,此插件处理的数据将不会被注册为可由其他插件直接访问的数据集 `(dataStream/dataset)`,或称为临时表 `(table)`。<br/>当指定了 `result_table_name` 时,此插件处理的数据将被注册为可由其他插件直接访问的数据集 `(dataStream/dataset)`,或称为临时表 `(table)`。此处注册的数据集 `(dataStream/dataset)` 可通过指定 `source_table_name` 直接被其他插件访问。 | | ||
| parallelism | Int | 否 | - | 当未指定 `parallelism` 时,默认使用环境中的 `parallelism`。<br/>当指定了 `parallelism` 时,将覆盖环境中的 `parallelism` 设置。 | | ||
|
||
# 重要提示 | ||
|
||
在作业配置中使用 `result_table_name` 时,必须设置 `source_table_name` 参数。 | ||
|
||
## 任务示例 | ||
|
||
### 简单示例 | ||
|
||
> 注册一个流或批处理数据源,并在注册时返回表名 `fake_table` | ||
```bash | ||
source { | ||
FakeSourceStream { | ||
result_table_name = "fake_table" | ||
} | ||
} | ||
``` | ||
|
||
### 复杂示例 | ||
|
||
> 这是将Fake数据源转换并写入到两个不同的目标中 | ||
```bash | ||
env { | ||
job.mode = "BATCH" | ||
} | ||
|
||
source { | ||
FakeSource { | ||
result_table_name = "fake" | ||
row.num = 100 | ||
schema = { | ||
fields { | ||
id = "int" | ||
name = "string" | ||
age = "int" | ||
c_timestamp = "timestamp" | ||
c_date = "date" | ||
c_map = "map<string, string>" | ||
c_array = "array<int>" | ||
c_decimal = "decimal(30, 8)" | ||
c_row = { | ||
c_row = { | ||
c_int = int | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
transform { | ||
Sql { | ||
source_table_name = "fake" | ||
result_table_name = "fake1" | ||
# 查询表名必须与字段 'source_table_name' 相同 | ||
query = "select id, regexp_replace(name, '.+', 'b') as name, age+1 as age, pi() as pi, c_timestamp, c_date, c_map, c_array, c_decimal, c_row from fake" | ||
} | ||
# SQL 转换支持基本函数和条件操作 | ||
# 但不支持复杂的 SQL 操作,包括:多源表/行 JOIN 和聚合操作等 | ||
} | ||
|
||
sink { | ||
Console { | ||
source_table_name = "fake1" | ||
} | ||
Console { | ||
source_table_name = "fake" | ||
} | ||
} | ||
``` | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.