Skip to content

Commit

Permalink
Merge branch 'dev' into sentry
Browse files Browse the repository at this point in the history
  • Loading branch information
yangshengjie committed Sep 6, 2022
2 parents 7b83ae5 + 9288948 commit b75714c
Show file tree
Hide file tree
Showing 181 changed files with 3,267 additions and 1,308 deletions.
7 changes: 4 additions & 3 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,8 @@ jobs:
-D"maven.wagon.httpconnectionManager.ttlSeconds"=120
dependency-license:
if: github.repository == 'apache/incubator-seatunnel'
# This job has somethings need todo, and it is not a blocker for the release.
if: "contains(toJSON(github.event.commits.*.message), '[ci-auto-license]')"
name: Dependency licenses
needs: [ sanity-check ]
runs-on: ubuntu-latest
Expand Down Expand Up @@ -144,7 +145,7 @@ jobs:
matrix:
java: [ '8', '11' ]
os: [ 'ubuntu-latest', 'windows-latest' ]
timeout-minutes: 50
timeout-minutes: 90
steps:
- uses: actions/checkout@v2
- name: Set up JDK ${{ matrix.java }}
Expand All @@ -167,7 +168,7 @@ jobs:
matrix:
java: [ '8', '11' ]
os: [ 'ubuntu-latest' ]
timeout-minutes: 50
timeout-minutes: 90
steps:
- uses: actions/checkout@v2
- name: Set up JDK ${{ matrix.java }}
Expand Down
65 changes: 65 additions & 0 deletions docs/en/concept/connector-v2-features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Intro To Connector V2 Features

## Differences Between Connector V2 And Connector v1

Since https://github.com/apache/incubator-seatunnel/issues/1608 We Added Connector V2 Features.
Connector V2 is a connector defined based on the Seatunnel Connector API interface. Unlike Connector V1, Connector V2 supports the following features.

* **Multi Engine Support** SeaTunnel Connector API is an engine independent API. The connectors developed based on this API can run in multiple engines. Currently, Flink and Spark are supported, and we will support other engines in the future.
* **Multi Engine Version Support** Decoupling the connector from the engine through the translation layer solves the problem that most connectors need to modify the code in order to support a new version of the underlying engine.
* **Unified Batch And Stream** Connector V2 can perform batch processing or streaming processing. We do not need to develop connectors for batch and stream separately.
* **Multiplexing JDBC/Log connection.** Connector V2 supports JDBC resource reuse and sharing database log parsing.

## Source Connector Features

Source connectors have some common core features, and each source connector supports them to varying degrees.

### exactly-once

If each piece of data in the data source will only be sent downstream by the source once, we think this source connector supports exactly once.

In SeaTunnel, we can save the read **Split** and its **offset**(The position of the read data in split at that time,
such as line number, byte size, offset, etc) as **StateSnapshot** when checkpoint. If the task restarted, we will get the last **StateSnapshot**
and then locate the **Split** and **offset** read last time and continue to send data downstream.

For example `File`, `Kafka`.

### schema projection

If the source connector supports selective reading of certain columns or redefine columns order or supports the data format read through `schema` params, we think it supports schema projection.

For example `JDBCSource` can use sql define read columns, `KafkaSource` can use `schema` params to define the read schema.

### batch

Batch Job Mode, The data read is bounded and the job will stop when all data read complete.

### stream

Streaming Job Mode, The data read is unbounded and the job never stop.

### parallelism

Parallelism Source Connector support config `parallelism`, every parallelism will create a task to read the data.
In the **Parallelism Source Connector**, the source will be split into multiple splits, and then the enumerator will allocate the splits to the SourceReader for processing.

### support user-defined split

User can config the split rule.

## Sink Connector Features

Sink connectors have some common core features, and each sink connector supports them to varying degrees.

### exactly-once

When any piece of data flows into a distributed system, if the system processes any piece of data accurately only once in the whole processing process and the processing results are correct, it is considered that the system meets the exact once consistency.

For sink connector, the sink connector supports exactly-once if any piece of data only write into target once. There are generally two ways to achieve this:

* The target database supports key deduplication. For example `MySQL`, `Kudu`.
* The target support **XA Transaction**(This transaction can be used across sessions. Even if the program that created the transaction has ended, the newly started program only needs to know the ID of the last transaction to resubmit or roll back the transaction). Then we can use **Two-phase Commit** to ensure **exactly-once**. For example `File`, `MySQL`.

### schema projection

If a sink connector supports the fields and their types or redefine columns order written in the configuration, we think it supports schema projection.
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Assert.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

A flink sink plugin which can assert illegal data by user defined rules

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [x] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
10 changes: 9 additions & 1 deletion docs/en/connector-v2/sink/Clickhouse.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,15 @@
## Description

Used to write data to Clickhouse. Supports Batch and Streaming mode.
Used to write data to Clickhouse.

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)

The Clickhouse sink plug-in can achieve accuracy once by implementing idempotent writing, and needs to cooperate with aggregatingmergetree and other engines that support deduplication.

- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/ClickhouseFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Generate the clickhouse data file with the clickhouse-local program, and then se
server, also call bulk load. This connector only support clickhouse table which engine is 'Distributed'.And `internal_replication` option
should be `true`. Supports Batch and Streaming mode.

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Write data to Clickhouse can also be done using JDBC
Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

A sink plugin which use send message to datahub

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Elasticsearch.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,11 @@

Output data to `Elasticsearch`.

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Engine Supported
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Send the data as a file to email.

The tested email version is 1.5.6.

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
4 changes: 4 additions & 0 deletions docs/en/connector-v2/sink/Enterprise-WeChat.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ A sink plugin which use Enterprise WeChat robot send message
> ```
**Tips: WeChat sink only support `string` webhook and the data from source will be treated as body content in web hook.**

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Feishu.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,11 @@ Used to launch feishu web hooks using data.
**Tips: Feishu sink only support `post json` webhook and the data from source will be treated as body content in web hook.**

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/FtpFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,12 @@

Output data to Ftp .

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|----------------------------------|---------|----------|-----------------------------------------------------------|
Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Greenplum.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@

Write data to Greenplum using [Jdbc connector](Jdbc.md).

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Not support exactly-once semantics (XA transaction is not yet supported in Greenplum database).
Expand Down
44 changes: 29 additions & 15 deletions docs/en/connector-v2/sink/HdfsFile.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,26 +4,40 @@
## Description

Output data to hdfs file. Support bounded and unbounded job.
Output data to hdfs file

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] csv
- [x] parquet
- [x] orc
- [x] json

## Options

In order to use this connector, You must ensure your spark/flink cluster already integrated hadoop. The tested hadoop version is 2.x.

| name | type | required | default value |
| --------------------------------- | ------ | -------- | ------------------------------------------------------------- |
| path | string | yes | - |
| file_name_expression | string | no | "${transactionId}" |
| file_format | string | no | "text" |
| filename_time_format | string | no | "yyyy.MM.dd" |
| field_delimiter | string | no | '\001' |
| row_delimiter | string | no | "\n" |
| partition_by | array | no | - |
| partition_dir_expression | string | no | "\${k0}=\${v0}\/\${k1}=\${v1}\/...\/\${kn}=\${vn}\/" |
| is_partition_field_write_in_file | boolean| no | false |
| sink_columns | array | no | When this parameter is empty, all fields are sink columns |
| is_enable_transaction | boolean| no | true |
| save_mode | string | no | "error" |
| name | type | required | default value |
| --------------------------------- | ------ | -------- |---------------------------------------------------------|
| path | string | yes | - |
| file_name_expression | string | no | "${transactionId}" |
| file_format | string | no | "text" |
| filename_time_format | string | no | "yyyy.MM.dd" |
| field_delimiter | string | no | '\001' |
| row_delimiter | string | no | "\n" |
| partition_by | array | no | - |
| partition_dir_expression | string | no | "${k0}=${v0}/${k1}=${v1}/.../${kn}=${vn}/" |
| is_partition_field_write_in_file | boolean| no | false |
| sink_columns | array | no | When this parameter is empty, all fields are sink columns |
| is_enable_transaction | boolean| no | true |
| save_mode | string | no | "error" |

### path [string]

Expand Down
12 changes: 12 additions & 0 deletions docs/en/connector-v2/sink/Hive.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,18 @@ Write data to Hive.

In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)

By default, we use 2PC commit to ensure `exactly-once`

- [ ] [schema projection](../../concept/connector-v2-features.md)
- [x] file format
- [x] text
- [x] parquet
- [x] orc

## Options

| name | type | required | default value |
Expand Down
7 changes: 6 additions & 1 deletion docs/en/connector-v2/sink/Http.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,12 +4,17 @@
## Description

Used to launch web hooks using data. Both support streaming and batch mode.
Used to launch web hooks using data.

> For example, if the data from upstream is [`age: 12, name: tyrantlucifer`], the body content is the following: `{"age": 12, "name": "tyrantlucifer"}`
**Tips: Http sink only support `post json` webhook and the data from source will be treated as body content in web hook.**

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
11 changes: 10 additions & 1 deletion docs/en/connector-v2/sink/IoTDB.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,16 @@
## Description

Used to write data to IoTDB. Supports Batch and Streaming mode.
Used to write data to IoTDB.

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)

IoTDB supports the `exactly-once` feature through idempotent writing. If two pieces of data have
the same `key` and `timestamp`, the new data will overwrite the old one.

- [ ] [schema projection](../../concept/connector-v2-features.md)

:::tip

Expand Down
9 changes: 9 additions & 0 deletions docs/en/connector-v2/sink/Jdbc.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,15 @@
## Description
Write data through jdbc. Support Batch mode and Streaming mode, support concurrent writing, support exactly-once semantics (using XA transaction guarantee).

## Key features

- [x] [exactly-once](../../concept/connector-v2-features.md)

Use `Xa transactions` to ensure `exactly-once`. So only support `exactly-once` for the database which is support `Xa transactions`. You can set `is_exactly_once=true` to enable it.

- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Kudu.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,11 @@ Write data to Kudu.

The tested kudu version is 1.11.1.

## Key features

- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [schema projection](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
Expand Down
Loading

0 comments on commit b75714c

Please sign in to comment.