Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subtask tracing issue for redo log based replication to implement eventually consistency #2351

Closed
16 of 30 tasks
amyangfei opened this issue Jul 22, 2021 · 3 comments
Closed
16 of 30 tasks
Labels
component/redolog subject/new-feature Denotes an issue or pull request adding a new feature.

Comments

@amyangfei
Copy link
Contributor

amyangfei commented Jul 22, 2021

Basic Design

This solution will introduce a redo log mechanism, the redo logs are persisted before sink module, the new workflow is as follows:

  • Before each sink replicates row data to downstream, saves the sorted and mounted row data to persistent storage.
  • Each sink uploads the resolved ts of persistent row data as a watermark and TiCDC owner aggregates the watermarks to generate a global watermark.
  • Then each sink can replicate row data with commit-ts less than the global watermark.

When a disaster happens in the upstream, TiCDC can switch to an offline mode, which means it only consumes data in local persistent storage.

Task breakdown

19 weeks totally, without optional and won't do features.

Note we will separate the sub tasks into several stages

  1. Implement basic feature, and provide a basic available version with eventually consistent replication feature. In this stage we will finish subtasks including 3, 4, 5.
  2. Test stage, in which we will finish subtask 6.
  3. Improve stage, in which we will finish subtask 1, 2.

Detail subtasks as follows

  1. Separate etcd from PD, adding standalone meta store. 3weeks (Won't do)

    • Add start-up parameters, add embedded etcd server in TiCDC. 1week
    • Add legacy-mode to provide backward compatibility, refine upgrade process. 1week
    • Refactor internal pd/etcd client usage, refine session management. 1week
    • Implement changefeed migration from PD to the embedded Etcd cluster. (optional)
  2. TiCDC offline mode. 4weeks (Won't do)

    • Add a new upstream resource manager, including pd client and tikv store.
      • When upstream is abnormal, the upstream resource manager can report error. 1week
      • Add upstream liveness checker in upstream resource management. 1week
      • TiCDC server maintains the upstream manager, add offline mode switch feature (also works during rolling upgrade or network jitter). 1week
    • Support to consume redo log in offline mode. 1week
    • Provide redo log reader api in redo storage layer for consume when offline mode #2581
  3. Binary based redo log consumer. 2weeks

  4. Redo log mechanism. 4weeks

  5. Redo log watermark coordinator. 2weeks

  6. Overall test and benchmark. 4weeks

    • Combine the whole features, integration tests. 1week
      • Scenario tests. 1week
      • Simulate two data centers and one data center meets disaster.
      • Simulate some chaos in upstream TiDB
      • Simulate chaos in TiCDC cluster
      • Compatibility tests
        • etcd related compatibility tests. 1week
      • Benchmark tests
        • Compare the replication performance when the eventually consistent feature is enabled or not. 1week
  7. Other tasks, mainly utilities, configuration related codes or test cdoes.

@amyangfei amyangfei added the subject/new-feature Denotes an issue or pull request adding a new feature. label Jul 22, 2021
@Rustin170506
Copy link
Member

Rustin170506 commented Jul 27, 2021

I'm working on the first part of things, splitting etcd.

The offline discussion decided not to add it for now because it might increase the complexity of ticdc.

@amyangfei
Copy link
Contributor Author

amyangfei commented Feb 22, 2022

The expected feature has GA.

@coderplay
Copy link

A general question: what is redo log? How it works?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/redolog subject/new-feature Denotes an issue or pull request adding a new feature.
Projects
None yet
Development

No branches or pull requests

3 participants