-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
druid-sink.properties inside config package is not working #161
Comments
@mhshimul Try this, note that this sink hasn't been properly tested so let us know how you get on, contributions also welcome!
|
Now after setting this property, getting following exception
|
lanbotdeployer
pushed a commit
that referenced
this issue
Nov 29, 2024
After processing files from the S3/GCP Storage source, this enables the feature of deleting or moving the files after they've been committed. # New KCQL Configuration Options for Datalake Cloud Connectors The following configuration options introduce post-processing capabilities for the AWS S3, GCP Storage, and (coming soon) Azure Datalake Gen 2 **source connectors**. These options allow the connector to manage source files after they are successfully processed, either by deleting the file or moving it to a new location in cloud storage. In Kafka Connect, post-processing is triggered when the framework calls the `commitRecord` method after a source record is successfully processed. The configured action then determines how the source file is handled. If no `post.process.action` is configured, **no post-processing will occur**, and the file will remain in its original location. --- ## KCQL Configuration Options ### 1. `post.process.action` - **Description**: Defines the action to perform on a file after it has been processed. - **Options**: - `DELETE` – Removes the file after processing. - `MOVE` – Relocates the file to a new location after processing. ### 2. `post.process.action.bucket` - **Description**: Specifies the target bucket for files when using the `MOVE` action. - **Applicability**: Only applies to the `MOVE` action. - **Notes**: This field is **mandatory** when `post.process.action` is set to `MOVE`. ### 3. `post.process.action.prefix` - **Description**: Specifies a new prefix to replace the existing one for the file’s location when using the `MOVE` action. The file's path will remain unchanged except for the prefix. - **Applicability**: Only applies to the `MOVE` action. - **Notes**: This field is **mandatory** when `post.process.action` is set to `MOVE`. --- ## Key Use Cases - **DELETE**: Automatically removes source files to free up storage space and prevent redundant data from remaining in the bucket. - **MOVE**: Organizes processed source files by relocating them to a different bucket or prefix, which is useful for archiving, categorizing, or preparing files for other workflows. --- ## Examples ### Example 1: Deleting Files After Processing To configure the source connector to delete files after processing, use the following KCQL: ```kcql INSERT INTO `my-bucket` SELECT * FROM `my-topic` PROPERTIES ( 'post.process.action'=`DELETE` ) ``` ### Example 2: Moving Files After Processing To configure the source connector to move files to a different bucket and prefix, use the following KCQL: ```kcql INSERT INTO `my-bucket:archive/` SELECT * FROM `my-topic` PROPERTIES ( 'post.process.action'=`MOVE`, 'post.process.action.bucket'=`archive-bucket`, 'post.process.action.prefix'=`archive/` ) ``` In this example: * The file is moved to `archive-bucket`. * The prefix `archive/` is applied to the file’s path while keeping the rest of the path unchanged. ## Important Considerations * Both `post.process.action.bucket` and `post.process.action.prefix` are mandatory when using the `MOVE` action. * For the `DELETE` action, no additional configuration is required. * If no `post.process.action` is configured, no post-processing will be applied, and the file will remain in its original location. * * Configuration for Burn-After-Reading * Implementing actions and storage interfaces. Needs add tests. The file move logic needs testing where it resolves the path - is this even the best configuration? * Storage interface tests * Address comment from review referencing this page on moving items in GCP: https://cloud.google.com/storage/docs/samples/storage-move-file * * Adding temporary logging, fixing a bug with the Map equality not enabling prefixes to map to each other * Fix Move action * Fix prefix replace behaviour * Changes to ensure error handling approach is correct * Review fixes - remove S3 references * Avoid variable shadowing * Avoid variable shadowing * add documentation * CopyObjectResponse
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I believe something has changed inside code for druid-sink.properties as I can see there is a new property connect.druid.sink.kcql inside DruidSinkConfig file. Need a working properties file sample.
The text was updated successfully, but these errors were encountered: