Skip to content

Commit

Permalink
Merge branch 'dev' into add-raw-format
Browse files Browse the repository at this point in the history
  • Loading branch information
lightzhao authored Aug 14, 2023
2 parents 84cadd2 + 6511f12 commit e0221fd
Show file tree
Hide file tree
Showing 75 changed files with 1,928 additions and 556 deletions.
2 changes: 0 additions & 2 deletions .github/workflows/backend.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,8 +22,6 @@ on:
branches:
- dev
paths-ignore:
- 'docs/**'
- '**/*.md'
- 'seatunnel-ui/**'

concurrency:
Expand Down
2 changes: 1 addition & 1 deletion DISCLAIMER
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Apache SeaTunnel (incubating) is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Apache SeaTunnel is an effort undergoing incubation at The Apache Software Foundation (ASF), sponsored by the Apache Incubator PMC.
Incubation is required of all newly accepted projects until a further review indicates that the infrastructure,
communications, and decision making process have stabilized in a manner consistent with other successful ASF projects.
While incubation status is not necessarily a reflection of the completeness or stability of the code,
Expand Down
3 changes: 1 addition & 2 deletions config/seatunnel.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@

seatunnel:
engine:
history-job-expire-minutes: 1440
backup-count: 1
queue-type: blockingqueue
print-execution-info-interval: 60
Expand All @@ -26,8 +27,6 @@ seatunnel:
checkpoint:
interval: 10000
timeout: 60000
max-concurrent: 1
tolerable-failure: 2
storage:
type: hdfs
max-retained: 3
Expand Down
47 changes: 47 additions & 0 deletions docs/en/connector-v2/formats/kafka-compatible-kafkaconnect-json.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
# Kafka source compatible kafka-connect-json

Seatunnel connector kafka supports parsing data extracted through kafka connect source, especially data extracted from kafka connect jdbc and kafka connect debezium

# How to use

## Kafka output to mysql

```bash
env {
execution.parallelism = 1
job.mode = "BATCH"
}

source {
Kafka {
bootstrap.servers = "localhost:9092"
topic = "jdbc_source_record"
result_table_name = "kafka_table"
start_mode = earliest
schema = {
fields {
id = "int"
name = "string"
description = "string"
weight = "string"
}
},
format = COMPATIBLE_KAFKA_CONNECT_JSON
}
}


sink {
Jdbc {
driver = com.mysql.cj.jdbc.Driver
url = "jdbc:mysql://localhost:3306/seatunnel"
user = st_user
password = seatunnel
generate_sink_sql = true
database = seatunnel
table = jdbc_sink
primary_keys = ["id"]
}
}
```

5 changes: 5 additions & 0 deletions docs/en/connector-v2/sink/Redis.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ Used to write data to Redis.
| mode | string | no | single |
| nodes | list | yes when mode=cluster | - |
| format | string | no | json |
| expire | long | no | -1 |
| common-options | | no | - |

### host [string]
Expand Down Expand Up @@ -120,6 +121,10 @@ Connector will generate data as the following and write it to redis:

```

### expire [long]

Set redis expiration time, the unit is second. The default value is -1, keys do not automatically expire by default.

### common options

Sink plugin common parameters, please refer to [Sink Common Options](common-options.md) for details
Expand Down
241 changes: 176 additions & 65 deletions docs/en/connector-v2/sink/S3File.md

Large diffs are not rendered by default.

215 changes: 101 additions & 114 deletions docs/en/connector-v2/source/MyHours.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,13 @@

> My Hours source connector
## Description
## Support Those Engines

Used to read data from My Hours.
> Spark<br/>
> Flink<br/>
> SeaTunnel Zeta<br/>
## Key features
## Key Features

- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
Expand All @@ -15,71 +17,103 @@ Used to read data from My Hours.
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

## Options

| name | type | required | default value |
|-----------------------------|---------|----------|---------------|
| url | String | Yes | - |
| email | String | Yes | - |
| password | String | Yes | - |
| method | String | No | get |
| schema | Config | No | - |
| schema.fields | Config | No | - |
| format | String | No | json |
| params | Map | No | - |
| body | String | No | - |
| json_field | Config | No | - |
| content_json | String | No | - |
| poll_interval_ms | int | No | - |
| retry | int | No | - |
| retry_backoff_multiplier_ms | int | No | 100 |
| retry_backoff_max_ms | int | No | 10000 |
| enable_multi_lines | boolean | No | false |
| common-options | config | No | - |

### url [String]

http request url

### email [String]

email for login

### password [String]

password for login

### method [String]

http request method, only supports GET, POST method

### params [Map]

http params

### body [String]

http body

### poll_interval_ms [int]
## Description

request http api interval(millis) in stream mode
Used to read data from My Hours.

### retry [int]
## Key features

The max retry times if request http return to `IOException`
- [x] [batch](../../concept/connector-v2-features.md)
- [ ] [stream](../../concept/connector-v2-features.md)
- [ ] [exactly-once](../../concept/connector-v2-features.md)
- [ ] [column projection](../../concept/connector-v2-features.md)
- [ ] [parallelism](../../concept/connector-v2-features.md)
- [ ] [support user-defined split](../../concept/connector-v2-features.md)

### retry_backoff_multiplier_ms [int]
## Supported DataSource Info

In order to use the My Hours connector, the following dependencies are required.
They can be downloaded via install-plugin.sh or from the Maven central repository.

| Datasource | Supported Versions | Dependency |
|------------|--------------------|---------------------------------------------------------------------------------------------|
| My Hours | universal | [Download](https://mvnrepository.com/artifact/org.apache.seatunnel/seatunnel-connectors-v2) |

## Source Options

| Name | Type | Required | Default | Description |
|-----------------------------|---------|----------|---------|--------------------------------------------------------------------------------------------------------------------------------------|
| url | String | Yes | - | Http request url. |
| email | String | Yes | - | My hours login email address. |
| password | String | Yes | - | My hours login password. |
| schema | Config | No | - | Http and seatunnel data structure mapping |
| schema.fields | Config | No | - | The schema fields of upstream data |
| json_field | Config | No | - | This parameter helps you configure the schema,so this parameter must be used with schema. |
| content_json | String | No | - | This parameter can get some json data.If you only need the data in the 'book' section, configure `content_field = "$.store.book.*"`. |
| format | String | No | json | The format of upstream data, now only support `json` `text`, default `json`. |
| method | String | No | get | Http request method, only supports GET, POST method. |
| headers | Map | No | - | Http headers. |
| params | Map | No | - | Http params. |
| body | String | No | - | Http body. |
| poll_interval_ms | Int | No | - | Request http api interval(millis) in stream mode. |
| retry | Int | No | - | The max retry times if request http return to `IOException`. |
| retry_backoff_multiplier_ms | Int | No | 100 | The retry-backoff times(millis) multiplier if request http failed. |
| retry_backoff_max_ms | Int | No | 10000 | The maximum retry-backoff times(millis) if request http failed |
| enable_multi_lines | Boolean | No | false | |
| common-options | | No | - | Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details |

## How to Create a My Hours Data Synchronization Jobs

The retry-backoff times(millis) multiplier if request http failed
```hocon
env {
execution.parallelism = 1
job.mode = "BATCH"
}
### retry_backoff_max_ms [int]
MyHours{
url = "https://api2.myhours.com/api/Projects/getAll"
email = "seatunnel@test.com"
password = "seatunnel"
schema {
fields {
name = string
archived = boolean
dateArchived = string
dateCreated = string
clientName = string
budgetAlertPercent = string
budgetType = int
totalTimeLogged = double
budgetValue = double
totalAmount = double
totalExpense = double
laborCost = double
totalCost = double
billableTimeLogged = double
totalBillableAmount = double
billable = boolean
roundType = int
roundInterval = int
budgetSpentPercentage = double
budgetTarget = int
budgetPeriodType = string
budgetSpent = string
id = string
}
}
}
The maximum retry-backoff times(millis) if request http failed
# Console printing of the read data
sink {
Console {
parallelism = 1
}
}
```

### format [String]
## Parameter Interpretation

the format of upstream data, now only support `json` `text`, default `json`.
### format

when you assign format is `json`, you should also assign schema option, for example:

Expand All @@ -98,11 +132,11 @@ you should assign schema as the following:
```hocon
schema {
fields {
code = int
data = string
success = boolean
}
fields {
code = int
data = string
success = boolean
}
}
```
Expand Down Expand Up @@ -131,13 +165,7 @@ connector will generate data as the following:
|----------------------------------------------------------|
| {"code": 200, "data": "get success", "success": true} |

### schema [Config]

#### fields [Config]

the schema fields of upstream data

### content_json [String]
### content_json

This parameter can get some json data.If you only need the data in the 'book' section, configure `content_field = "$.store.book.*"`.

Expand Down Expand Up @@ -212,14 +240,14 @@ Here is an example:
- Test data can be found at this link [mockserver-config.json](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/mockserver-config.json)
- See this link for task configuration [http_contentjson_to_assert.conf](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/http_contentjson_to_assert.conf).

### json_field [Config]
### json_field

This parameter helps you configure the schema,so this parameter must be used with schema.

If your data looks something like this:

```json
{
{
"store": {
"book": [
{
Expand Down Expand Up @@ -273,47 +301,6 @@ source {
- Test data can be found at this link [mockserver-config.json](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/mockserver-config.json)
- See this link for task configuration [http_jsonpath_to_assert.conf](../../../../seatunnel-e2e/seatunnel-connector-v2-e2e/connector-http-e2e/src/test/resources/http_jsonpath_to_assert.conf).

### common options

Source plugin common parameters, please refer to [Source Common Options](common-options.md) for details

## Example

```hocon
MyHours{
url = "https://api2.myhours.com/api/Projects/getAll"
email = "seatunnel@test.com"
password = "seatunnel"
schema {
fields {
name = string
archived = boolean
dateArchived = string
dateCreated = string
clientName = string
budgetAlertPercent = string
budgetType = int
totalTimeLogged = double
budgetValue = double
totalAmount = double
totalExpense = double
laborCost = double
totalCost = double
billableTimeLogged = double
totalBillableAmount = double
billable = boolean
roundType = int
roundInterval = int
budgetSpentPercentage = double
budgetTarget = int
budgetPeriodType = string
budgetSpent = string
id = string
}
}
}
```

## Changelog

### next version
Expand Down
Loading

0 comments on commit e0221fd

Please sign in to comment.