Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/kafka properties for source #618

Merged
merged 11 commits into from
Mar 21, 2024

Conversation

zliang-min
Copy link
Collaborator

PR checklist:

  • Did you run ClangFormat ?
  • Did you separate headers to a different section in existing community code base ?
  • Did you surround proton: starts/ends for new code in existing community code base ?

Please write user-readable short description of the changes:

This PR has completely rewritten how producers and consumers are managed in Kafka external streams.

First, all librdkafka resource management code (for external stream) have been moved to the RdKafka namespace. In Kafka external stream implementaion, it no longer uses the code from the klog namespace (directly). The goal is that, eventually, the Kafka external stream implementation and klog will all use RdKafka for communicating with Kafka via the librdkafka library. And klog will only be used the Proton internal Kafka log stream.

With this PR, resources from librdkafka will be shared by multiple sources/sinks of the same external stream as much as possible. This will reduce resource consumption compared to the current implementation. Esp. when there are a lot of topics and topics have a lot partitions. refers #544

This also fixed the resource leakage issue with the current pooling implementation. closes #236

This also fixed external stream won't pick up new paritions when used as a source. closes #291 ( Note that, a running MV or query that selects from an external stream won't automatically updated to pick up the new paritions, it needs to be re-created or re-run ).

This also fixed the issue that the properties setting does not work for external stream source. closes #543

This also fixed the issue that an external stream still gets created even some settings are not correct. closes #617

This also fixed an issue when using ProtobufSingle to write multiple rows of data to an external stream. (no ticket)

@yokofly
Copy link
Collaborator

yokofly commented Mar 20, 2024

failed smoke

2024-03-20T10:01:56.2008603Z =========================== short test summary info ============================
2024-03-20T10:01:56.2009692Z FAILED test_stream_smoke/test_query_smoke.py::test_run[external_stream-3-test external stream multi_shard]
2024-03-20T10:01:56.2011119Z FAILED test_stream_smoke/test_query_smoke.py::test_run[external_stream-4-test external stream multi_shard and seek_to]
2024-03-20T10:01:56.2014460Z FAILED test_stream_smoke/test_query_smoke.py::test_run[external_stream-5-test external stream multi_shard and seek_to, now the test framework doesn't support validate the data read from kafkaseek_to result.So we insert data into external stream and read from external stream to validate the result.The data with the same shard_expr result should be divided into the same shard.]
2024-03-20T10:01:56.2017349Z ================== 3 failed, 844 passed in 859.17s (0:14:19) ===================

src/Storages/ExternalStream/Kafka/Kafka.cpp Outdated Show resolved Hide resolved
src/Storages/ExternalStream/Kafka/Kafka.cpp Outdated Show resolved Hide resolved
src/Storages/ExternalStream/Kafka/Topic.h Outdated Show resolved Hide resolved
src/Storages/ExternalStream/Kafka/Kafka.cpp Outdated Show resolved Hide resolved
src/Storages/ExternalStream/Kafka/Kafka.cpp Outdated Show resolved Hide resolved
src/Storages/ExternalStream/Kafka/Kafka.cpp Outdated Show resolved Hide resolved
@zliang-min zliang-min force-pushed the feature/kafka-properties-for-source branch from ca6dc4f to 242f15c Compare March 20, 2024 21:30
@zliang-min zliang-min requested a review from yl-lisen March 20, 2024 21:31
Copy link
Collaborator

@yl-lisen yl-lisen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zliang-min zliang-min merged commit a69a446 into develop Mar 21, 2024
21 checks passed
@zliang-min zliang-min deleted the feature/kafka-properties-for-source branch March 21, 2024 07:05
@jovezhong
Copy link
Contributor

(Jove Github Bot) added it to the current sprint.

@jovezhong
Copy link
Contributor

(Jove Github Bot) moved this ticket out of the GitHub project(up to 1200 tickets for one project).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
4 participants