Hudi source connector
Spark
Flink
SeaTunnel Zeta
Used to read data from Hudi. Currently, only supports hudi cow table and Snapshot Query with Batch Mode.
In order to use this connector, You must ensure your spark/flink cluster already integrated hive. The tested hive version is 2.3.9.
:::tip
- Currently, only supports Hudi cow table and Snapshot Query with Batch Mode
:::
Hudi Data type | Seatunnel Data type |
---|---|
ALL TYPE | STRING |
Name | Type | Required | Default | Description |
---|---|---|---|---|
table.path | String | Yes | - | The hdfs root path of hudi table,such as 'hdfs://nameserivce/data/hudi/hudi_table/'. |
table.type | String | Yes | - | The type of hudi table. Now we only support 'cow', 'mor' is not support yet. |
conf.files | String | Yes | - | The environment conf file path list(local path), which used to init hdfs client to read hudi table file. The example is '/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml'. |
use.kerberos | bool | No | false | Whether to enable Kerberos, default is false. |
kerberos.principal | String | yes when use.kerberos = true | - | When use kerberos, we should set kerberos principal such as 'test_user@xxx'. |
kerberos.principal.file | string | yes when use.kerberos = true | - | When use kerberos, we should set kerberos principal file such as '/home/test/test_user.keytab'. |
common-options | config | No | - | Source plugin common parameters, please refer to Source Common Options for details. |
This example reads from a Hudi COW table and configures Kerberos for the environment, printing to the console.
# Defining the runtime environment
env {
# You can set flink configuration here
execution.parallelism = 2
job.mode = "BATCH"
}
source{
Hudi {
table.path = "hdfs://nameserivce/data/hudi/hudi_table/"
table.type = "cow"
conf.files = "/home/test/hdfs-site.xml;/home/test/core-site.xml;/home/test/yarn-site.xml"
use.kerberos = true
kerberos.principal = "test_user@xxx"
kerberos.principal.file = "/home/test/test_user.keytab"
}
}
transform {
# If you would like to get more information about how to configure seatunnel and see full list of transform plugins,
# please go to https://seatunnel.apache.org/docs/transform-v2/sql/
}
sink {
Console {}
}
- Add Hudi Source Connector