Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for splitting big transaction #5231

Closed
8 of 9 tasks
CharlesCheung96 opened this issue Apr 21, 2022 · 1 comment
Closed
8 of 9 tasks

Tracking issue for splitting big transaction #5231

CharlesCheung96 opened this issue Apr 21, 2022 · 1 comment
Labels
affects-6.1 area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement.

Comments

@CharlesCheung96
Copy link
Contributor

CharlesCheung96 commented Apr 21, 2022

Is your feature request related to a problem?

For CDC, a large transaction is expressed as "a single transaction containing a large number of row-level KV changes". In the current data flow link (puller --> sorter --> mounter --> sink), the sorter is equivalent to an infinite reservoir, and its output transaction events can be formalized as
pre_resolved_ts, rawkv_1, rawkv_2,rawkv_3 ...... rawkv_n, resolved_ts
The following problems may occur in the case where n takes a large value.

  1. OOM problem: Sink flushes data downstream only when it receives resolved_ts, so multiple change events will be piled up in memory; meanwhile, since mounter decodes rawkv events as RowChangedEvent, it will further increase memory consumption.
  2. Sink latency problem: In the current implementation, MysqlSink reorganizes the above change events into one transaction based on the StartTs of RowChangedEvent; for a single transaction, MysqlSink uses only one worker to write downstream in order to ensure the ACID characteristics of single-table transactions, which limits the throughput of the sink module in large transaction scenarios.

Describe the feature you'd like

Solve OOM and latency problems by splitting transactions and writing to downstream in multiple batches.

Related Issues

@CharlesCheung96 CharlesCheung96 added the type/enhancement The issue or PR belongs to an enhancement. label Apr 21, 2022
@CharlesCheung96 CharlesCheung96 added the area/ticdc Issues or PRs related to TiCDC. label Apr 21, 2022
@nongfushanquan
Copy link
Contributor

/label affects-6.1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-6.1 area/ticdc Issues or PRs related to TiCDC. type/enhancement The issue or PR belongs to an enhancement.
Projects
None yet
Development

No branches or pull requests

3 participants