Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to limit batch size by the number of change records in Open Protocol #1055

Closed
overvenus opened this issue Nov 4, 2020 · 4 comments
Labels
component/open-protocol Open TiCDC protocol component. component/sink Sink component.

Comments

@overvenus
Copy link
Member

overvenus commented Nov 4, 2020

Feature Request

Is your feature request related to a problem? Please describe:

In certain situations, it is inconvenient to separate a message into multiple change records in the consumer, as the system on which the consumer is built may not support flat-mapping a stream in an easy way.

Describe the feature you'd like:

The ability to limit batching by numbers of records in Open Protocol (Kafka/Pulsar sink).

Describe alternatives you've considered:

If the user uses Flink, they can write a Flink Job that transforms the message stream and write it back to another topic in Kafka, or feed to other stream systems.

@overvenus overvenus changed the title Add option to disable batching in Open Protocol Add option to limit batching size/count in Kafka/Pulsar sink Nov 4, 2020
@liuzix
Copy link
Contributor

liuzix commented Nov 4, 2020

This is easy for Open Protocol, but may be challenging in other protocols where the final length of the message is unknown before building it. One way to solve this problem is to rebuild a complete message upon receiving a row event, and record its size.

Edited: This comment is no longer relevant given the topic has changed. See a related issue for more information #1059

@liuzix liuzix changed the title Add option to limit batching size/count in Kafka/Pulsar sink Add option to limit batch size by the number of change records in Kafka/Pulsar sink Nov 5, 2020
@liuzix liuzix changed the title Add option to limit batch size by the number of change records in Kafka/Pulsar sink Add option to limit batch size by the number of change records in Open Protocol Nov 5, 2020
@liuzix liuzix added the component/open-protocol Open TiCDC protocol component. label Nov 5, 2020
@dreamcatchernick
Copy link

This is easy for Open Protocol, but may be challenging in other protocols where the final length of the message is unknown before building it. One way to solve this problem is to rebuild a complete message upon receiving a row event, and record its size.

would it be easier to support a row based event?
for instance:

Giving let's say 10 insertions in one commit. output 10 messages

@liuzix
Copy link
Contributor

liuzix commented Nov 5, 2020

@dreamcatchernick Hi, the Open Protocol currently does batching only to increase performance, but not to represent transactions. Batches and Transactions do not correspond to each other. What you are proposing is already implemented in other protocols, such as Avro and CanalJson. We have received a feature request asking for a single-record version of Open Protocol, but the merit is still being determined.

If you have any ideas or comments related to my previous comment, free feel to comment in #1059

@amyangfei
Copy link
Contributor

Since #1079 has added max-batch-size to control the number of records in a single message produced by Open Protocol. Should we close this issue now? @overvenus @liuzix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/open-protocol Open TiCDC protocol component. component/sink Sink component.
Projects
None yet
Development

No branches or pull requests

4 participants