Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP][docs] Documentation for logical replication with PG connector #23065

Merged
merged 111 commits into from
Jul 30, 2024
Merged
Show file tree
Hide file tree
Changes from 38 commits
Commits
Show all changes
111 commits
Select commit Hold shift + click to select a range
23c0d58
initial commit for logical replication docs
vaibhav-yb Jul 1, 2024
c9b69ab
title changes
vaibhav-yb Jul 1, 2024
8388b68
changes to view table
vaibhav-yb Jul 1, 2024
07af18c
fixed line break
vaibhav-yb Jul 1, 2024
a7560a0
fixed line break
vaibhav-yb Jul 1, 2024
5dd6fa1
added content for delete and update
vaibhav-yb Jul 1, 2024
68e8117
added more content
vaibhav-yb Jul 1, 2024
001c673
replaced hyperlink todos with reminders
vaibhav-yb Jul 1, 2024
e7de8e1
added snapshot metrics
vaibhav-yb Jul 1, 2024
be6bb39
added more content
vaibhav-yb Jul 1, 2024
6427881
added more config properties to docs
vaibhav-yb Jul 2, 2024
0b9717e
added more config properties to docs
vaibhav-yb Jul 2, 2024
28f7bdd
added more config properties to docs
vaibhav-yb Jul 2, 2024
c650193
replaced postgresql instances with yugabytedb
vaibhav-yb Jul 3, 2024
04e9388
added properties
vaibhav-yb Jul 3, 2024
34fd2b3
added complete properties
vaibhav-yb Jul 3, 2024
686dfec
changed postgresql to yugabytedb
vaibhav-yb Jul 3, 2024
33c08cb
added example for all record types
vaibhav-yb Jul 3, 2024
d50dc7c
fixed highlighting of table header
vaibhav-yb Jul 3, 2024
921fc23
added type representations
vaibhav-yb Jul 3, 2024
0ee0380
added type representations
vaibhav-yb Jul 4, 2024
a8dfc6b
full content in now;
vaibhav-yb Jul 4, 2024
45ee2db
full content in now;
vaibhav-yb Jul 4, 2024
7d9f3a3
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 8, 2024
3ad01f5
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 9, 2024
66dec5d
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 11, 2024
9ad710f
changed postgres references appropriately
vaibhav-yb Jul 11, 2024
38d344b
added a missing keyword
vaibhav-yb Jul 11, 2024
998a141
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 12, 2024
f8b199a
changed name
vaibhav-yb Jul 15, 2024
17219c4
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 15, 2024
6e6b35e
self review comments
vaibhav-yb Jul 15, 2024
4da3203
self review comments
vaibhav-yb Jul 15, 2024
d3ca696
added section for logical replication
vaibhav-yb Jul 16, 2024
42f443a
added section for logical replication
vaibhav-yb Jul 16, 2024
11aecfa
modified content for monitor page
vaibhav-yb Jul 16, 2024
681938e
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 16, 2024
54a9bba
added content for monitoring
vaibhav-yb Jul 16, 2024
1ab7f86
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 18, 2024
43db471
rebased to master;
vaibhav-yb Jul 18, 2024
e048426
CDC logical replication overview (#3)
siddharth2411 Jul 18, 2024
0260daa
advanced-topic (#5)
siddharth2411 Jul 18, 2024
ea3a192
removed references to incremental and ad-hoc snapshots
vaibhav-yb Jul 19, 2024
d0cc51b
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 19, 2024
ab93c86
replaced index page with an empty one
vaibhav-yb Jul 19, 2024
aff0649
addressed review comments
vaibhav-yb Jul 19, 2024
8051845
added getting started section
vaibhav-yb Jul 19, 2024
7816e56
added section for get started
vaibhav-yb Jul 19, 2024
ac48d3d
self review comments
vaibhav-yb Jul 22, 2024
96f5691
self review comments
vaibhav-yb Jul 22, 2024
7210629
group review comments
vaibhav-yb Jul 22, 2024
9bc214b
added hstore and domain type docs
vaibhav-yb Jul 22, 2024
3c94ab8
Advance configurations for CDC using logical replication (#2)
siddharth2411 Jul 22, 2024
49af239
Fix overview section (#7)
siddharth2411 Jul 22, 2024
addd1af
Monitor section (#4)
siddharth2411 Jul 22, 2024
2c86f88
Initial Snapshot content (#6)
siddharth2411 Jul 22, 2024
4a98ff4
Add getting started (#1)
siddharth2411 Jul 22, 2024
03b46ce
Fix for broken note (#9)
siddharth2411 Jul 23, 2024
334358b
Fix the issue yaml parsing
Vars-07 Jul 19, 2024
6ac1ba9
[PLAT-14534]Add regex match for GCP Instance template
asharma-yb Jul 12, 2024
9e2ef4b
update diagram (#23245)
ddhodge Jul 19, 2024
7eb99fb
[/PLAT-14708] Fix JSON field name in TaskInfo query
nkhogen Jul 19, 2024
f7553ae
[#23173] DocDB: Allow large bytes to be passed to RateLimiter
hari90 Jul 19, 2024
23bdfb6
[#23179] CDCSDK: Support data types with dynamically alloted oids in CDC
Sumukh-Phalgaonkar Jul 19, 2024
ff799a8
[PLAT-14710] Do not return apiToken in response to getSessionInfo
subramanian-neelakantan Jul 19, 2024
f1329ce
[docs] updates to CVE table status column (#23225)
aishwarya24 Jul 22, 2024
0280778
[docs] Fix load balance keyword in drivers page (#23253)
ddorian Jul 22, 2024
39aa198
fixed compilation
vaibhav-yb Jul 23, 2024
58b706f
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 23, 2024
8f7ea90
fix link, format
ddhodge Jul 23, 2024
775783d
format, links
ddhodge Jul 23, 2024
0d8fe60
links, format
ddhodge Jul 23, 2024
b9719cd
format
ddhodge Jul 23, 2024
9a32af6
format
ddhodge Jul 24, 2024
1034941
minor edit
ddhodge Jul 24, 2024
6449c91
best practice (#8)
siddharth2411 Jul 24, 2024
551882a
moved sections
vaibhav-yb Jul 24, 2024
860fc55
moved pages
vaibhav-yb Jul 24, 2024
ceda9b2
added key concepts page
vaibhav-yb Jul 24, 2024
98bc2c1
added link to getting started
vaibhav-yb Jul 24, 2024
6c0fa1f
Dynamic table doc changes (#11)
Sumukh-Phalgaonkar Jul 24, 2024
43a5f3c
icons
ddhodge Jul 24, 2024
bbcc26d
added box for lead link
vaibhav-yb Jul 25, 2024
269b4fa
revert ybclient change
vaibhav-yb Jul 25, 2024
bc6c825
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 25, 2024
2dc50a8
revert accidental change
vaibhav-yb Jul 25, 2024
1f0a9c4
revert accidental change
vaibhav-yb Jul 25, 2024
1de2360
revert accidental change
vaibhav-yb Jul 25, 2024
3467ba1
fix link block for getting started page
vaibhav-yb Jul 25, 2024
880af88
format
ddhodge Jul 25, 2024
5378c3f
minor edit
ddhodge Jul 25, 2024
c15f849
links, format
ddhodge Jul 25, 2024
9dbd8bf
format
ddhodge Jul 25, 2024
bdd07ce
links
ddhodge Jul 25, 2024
9c9cada
format
ddhodge Jul 25, 2024
3dfcd35
remove reminder references
vaibhav-yb Jul 26, 2024
3d8051f
Modified output plugin docs (#12)
Sumukh-Phalgaonkar Jul 26, 2024
6a83cca
Merge branch 'master' into pg-logical-repl-docs
vaibhav-yb Jul 29, 2024
501409c
Naming edits
ddhodge Jul 29, 2024
cf641ba
format
ddhodge Jul 29, 2024
cd480b7
review comments
ddhodge Jul 29, 2024
4b6b3e7
diagram
ddhodge Jul 29, 2024
c91db49
review comment
ddhodge Jul 29, 2024
47be6a8
fix links
ddhodge Jul 29, 2024
fec412f
format
ddhodge Jul 30, 2024
6ace736
format
ddhodge Jul 30, 2024
fee9442
link
ddhodge Jul 30, 2024
cd46c7c
review comments
ddhodge Jul 30, 2024
6d22cf4
copy to stable
ddhodge Jul 30, 2024
564b94f
link
ddhodge Jul 30, 2024
0ab5c65
Merge branch 'master' into pg-logical-repl-docs
ddhodge Jul 30, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/content/preview/explore/_index.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ The following table describes the YugabyteDB features you can explore, along wit
| [Multi-region deployments](multi-region-deployments/) | Learn about the different multi-region topologies that you can deploy using YugabyteDB. | Multi-node<br/>local |
| [Query tuning](query-1-performance/) | Learn about the tools available to identify and optimize queries in YSQL. | Single-node<br/>local/cloud |
| [Cluster management](cluster-management/) | Learn how to roll back database changes to a specific point in time using point in time recovery. | Single-node<br/>local |
| [CDC logical replication](cdc-logical-replication/) | Learn about YugabyteDB support for CDC using logical replication. | N/A |
ddhodge marked this conversation as resolved.
Show resolved Hide resolved
| [Change data capture](change-data-capture/) | Learn about YugabyteDB support for streaming data to Kafka. | N/A |
| [Security](security/security/) | Learn how to secure data in YugabyteDB, using authentication, authorization (RBAC), encryption, and more. | Single-node<br/>local/cloud |
| [Observability](observability/) | Export metrics into Prometheus and create dashboards using Grafana. | Multi-node<br/>local |
Expand Down
89 changes: 89 additions & 0 deletions docs/content/preview/explore/cdc-logical-replication/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
title: CDC Logical Replication (CDC)
headerTitle: CDC Logical Replication (CDC)
linkTitle: CDC Logical Replication
description: CDC or Change data capture is a process to capture changes made to data in the database.
headcontent: Capture changes made to data in the database
image: /images/section_icons/index/develop.png
# cascade:
# earlyAccess: /preview/releases/versioning/#feature-maturity
menu:
preview:
identifier: explore-cdc-logical-replication
parent: explore
weight: 280
type: indexpage
---

{{< note title="Note for internal contribution" >}}

This page is currently under development and is most likely that the content is not up to date with what the heading intends.

{{< /note >}}

In databases, change data capture (CDC) is a set of software design patterns used to determine and track the data that has changed so that action can be taken using the changed data. CDC is beneficial in a number of scenarios. Let us look at few of them.

- **Microservice-oriented architectures** : Some microservices require a stream of changes to the data, and using CDC in YugabyteDB can provide consumable data changes to CDC subscribers.

- **Asynchronous replication to remote systems** : Remote systems may subscribe to a stream of data changes and then transform and consume the changes. Maintaining separate database instances for transactional and reporting purposes can be used to manage workload performance.

- **Multiple data center strategies** : Maintaining multiple data centers enables enterprises to provide high availability (HA).

- **Compliance and auditing** : Auditing and compliance requirements can require you to use CDC to maintain records of data changes.

{{<index/block>}}

{{<index/item
title="Get started"
body="Get set up for using CDC in YugabyteDB."
href="cdc-log-rep-get-started/"
icon="/images/section_icons/index/quick_start.png">}}

{{<index/item
title="Section undecided"
body="How to stream data with different Kafka environments."
href="../../tutorials/cdc-tutorials/"
icon="/images/section_icons/develop/ecosystem/apache-kafka-icon.png">}}

{{</index/block>}}

## How does CDC work

YugabyteDB CDC captures changes made to data in the database and streams those changes to external processes, applications, or other databases. CDC allows you to track and propagate changes in a YugabyteDB database to downstream consumers based on its Write-Ahead Log (WAL). YugabyteDB CDC uses Debezium to capture row-level changes resulting from INSERT, UPDATE, and DELETE operations in the upstream database, and publishes them as events to Kafka using Kafka Connect-compatible connectors.

![What is CDC](/images/explore/cdc-overview-what.png)

{{<lead link="./cdc-overview">}}
To know more about the internals of CDC, see [Overview](./cdc-overview).
{{</lead>}}

## Debezium connector

To capture and stream your changes in YugabyteDB to an external system, you need a connector that can read the changes in YugabyteDB and stream it out. For this, you can use the Debezium connector. Debezium is deployed as a set of Kafka Connect-compatible connectors, so you first need to define a YugabyteDB connector configuration and then start the connector by adding it to Kafka Connect.

{{<lead link="./debezium-connector-postgresql">}}
To understand how the various features and configuration of the connector, see [Debezium connector](./debezium-connector-postgresql).
{{</lead>}}

## Monitoring

You can monitor the activities and status of the deployed connectors using the http end points provided by YugabyteDB.

{{<lead link="./cdc-monitor">}}
To know more about how to monitor your CDC setup, see [Monitor](./cdc-monitor).
{{</lead>}}

For tutorials on streaming data to Kafka environments, including Amazon MSK, Azure Event Hubs, and Confluent Cloud, see [Kafka environments](/preview/tutorials/cdc-tutorials/).

## Learn more
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The learn more section is not related to logical replication but the old gRPC approach


- [Examples of CDC usage and patterns](https://github.com/yugabyte/cdc-examples/tree/main) {{<icon/github>}}
- [Tutorials to deploy in different Kafka environments](../../tutorials/cdc-tutorials/) {{<icon/tutorial>}}
- [Data Streaming Using YugabyteDB CDC, Kafka, and SnowflakeSinkConnector](https://www.yugabyte.com/blog/data-streaming-using-yugabytedb-cdc-kafka-and-snowflakesinkconnector/) {{<icon/blog>}}
- [Unlock Azure Storage Options With YugabyteDB CDC](https://www.yugabyte.com/blog/unlocking-azure-storage-options-with-yugabytedb-cdc/) {{<icon/blog>}}
- [Change Data Capture From YugabyteDB to Elasticsearch](https://www.yugabyte.com/blog/change-data-capture-cdc-yugabytedb-elasticsearch/) {{<icon/blog>}}
- [Snowflake CDC: Publishing Data Using Amazon S3 and YugabyteDB](https://www.yugabyte.com/blog/snowflake-cdc-publish-data-using-amazon-s3-yugabytedb/) {{<icon/blog>}}
- [Streaming Changes From YugabyteDB to Downstream Databases](https://www.yugabyte.com/blog/streaming-changes-yugabytedb-cdc-downstream-databases/) {{<icon/blog>}}
- [Change Data Capture from YugabyteDB CDC to ClickHouse](https://www.yugabyte.com/blog/change-data-capture-cdc-yugabytedb-clickhouse/) {{<icon/blog>}}
- [How to Run Debezium Server with Kafka as a Sink](https://www.yugabyte.com/blog/change-data-capture-cdc-run-debezium-server-kafka-sink/) {{<icon/blog>}}
- [Change Data Capture Using a Spring Data Processing Pipeline](https://www.yugabyte.com/blog/change-data-capture-cdc-spring-data-processing-pipeline/) {{<icon/blog>}}
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
title: Get started with CDC logical replication in YugabyteDB
headerTitle: Get started
linkTitle: Get started
description: Get started with Change Data Capture in YugabyteDB.
headcontent: Get set up for using CDC using logical replication in YugabyteDB
menu:
preview:
parent: explore-cdc-logical-replication
identifier: cdc-log-rep-get-started
weight: 30
type: docs
---

Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
---
title: CDC monitoring in YugabyteDB
headerTitle: Monitor
linkTitle: Monitor
description: Monitor Change Data Capture in YugabyteDB.
headcontent: Monitor deployed CDC connectors
menu:
preview:
parent: explore-cdc-logical-replication
identifier: cdc-log-rep-monitor
weight: 60
type: docs
---

## Status of the deployed connector

You can use the rest APIs to monitor your deployed connectors. The following operations are available:

* List all connectors

```sh
curl -X GET localhost:8083/connectors/
```

* Get a connector's configuration

```sh
curl -X GET localhost:8083/connectors/<connector-name>
```

* Get the status of all tasks with their configuration

```sh
curl -X GET localhost:8083/connectors/<connector-name>/tasks
```

* Get the status of the specified task

```sh
curl -X GET localhost:8083/connectors/<connector-name>/tasks/<task-id>
```

* Get the connector's status, and the status of its tasks

```sh
curl -X GET localhost:8083/connectors/<connector-name>/status
```

## Metrics

### CDC Service metrics

Provide information about CDC service in YugabyteDB.

| Metric name | Type | Description |
| :---- | :---- | :---- |
| cdcsdk_change_event_count | `long` | The Change Event Count metric shows the number of records sent by the CDC Service. |
| cdcsdk_traffic_sent | `long` | The number of milliseconds since the connector has read and processed the most recent event. |
| cdcsdk_event_lag_micros | `long` | The LAG metric is calculated by subtracting the timestamp of the latest record in the WAL of a tablet from the last record sent to the CDC connector. |
| cdcsdk_expiry_time_ms | `long` | The time left to read records from WAL is tracked by the Stream Expiry Time (ms). |

In addition to the built-in support for JMX metrics that Zookeeper, Kafka, and Kafka Connect provide, the Debezium YugabyteDB connector provides the following types of metrics.

### Snapshot metrics

The **MBean** is `debezium.postgres:type=connector-metrics,context=snapshot,server=<topic.prefix>`.

Snapshot metrics are not exposed unless a snapshot operation is active, or if a snapshot has occurred since the last connector start.

The following table lists the shapshot metrics that are available.

| Attributes | Type | Description |
| :--------- | :--- | :---------- |
| `LastEvent` | string | The last snapshot event that the connector has read. |
| `MilliSecondsSinceLastEvent` | long | The number of milliseconds since the connector has read and processed the most recent event. |
| `TotalNumberOfEventsSeen` | long | The total number of events that this connector has seen since last started or reset. |
| `NumberOfEventsFiltered` | long | The number of events that have been filtered by include/exclude list filtering rules configured on the connector. |
| `CapturedTables` | string[] | The list of tables that are captured by the connector. |
| `QueueTotalCapacity` | int | The length the queue used to pass events between the snapshotter and the main Kafka Connect loop. |
| `QueueRemainingCapacity` | int | The free capacity of the queue used to pass events between the snapshotter and the main Kafka Connect loop. |
| `TotalTableCount` | int | The total number of tables that are being included in the snapshot. |
| `RemainingTableCount` | int | The number of tables that the snapshot has yet to copy. |
| `SnapshotRunning` | boolean | Whether the snapshot was started. |
| `SnapshotPaused` | boolean | Whether the snapshot was paused. |
| `SnapshotAborted` | boolean | Whether the snapshot was aborted. |
| `SnapshotCompleted` | boolean | Whether the snapshot completed. |
| `SnapshotDurationInSeconds` | long | The total number of seconds that the snapshot has taken so far, even if not complete. Includes also time when snapshot was paused. |
| `SnapshotPausedDurationInSeconds` | long | The total number of seconds that the snapshot was paused. If the snapshot was paused several times, the paused time adds up. |
| `RowsScanned` | Map<String, Long> | Map containing the number of rows scanned for each table in the snapshot. Tables are incrementally added to the Map during processing. Updates every 10,000 rows scanned and upon completing a table. |
| `MaxQueueSizeInBytes` | long | The maximum buffer of the queue in bytes. This metric is available if `max.queue.size.in.bytes` is set to a positive long value. |
| `CurrentQueueSizeInBytes` | long | The current volume, in bytes, of records in the queue. |

The connector also provides the following additional snapshot metrics when an incremental snapshot is executed:

| Attributes | Type | Description |
| :--------- | :--- | :---------- |
| `ChunkId` | string | The identifier of the current snapshot chunk. |
| `ChunkFrom` | string | The lower bound of the primary key set defining the current chunk. |
| `ChunkTo` | string | The upper bound of the primary key set defining the current chunk. |
| `TableFrom` | string | The lower bound of the primary key set of the currently snapshotted table. |
| `TableTo` | string | The upper bound of the primary key set of the currently snapshotted table. |

### Streaming metrics

The **MBean** is `debezium.postgres:type=connector-metrics,context=streaming,server=<topic.prefix>`.

The following table lists the streaming metrics that are available.

| Attributes | Type | Description |
| :--------- | :--- | :---------- |
| `LastEvent` | string | The last streaming event that the connector has read. |
| `MilliSecondsSinceLastEvent` | long | The number of milliseconds since the connector has read and processed the most recent event. |
| `TotalNumberOfEventsSeen` | long | The total number of events that this connector has seen since the last start or metrics reset. |
| `TotalNumberOfCreateEventsSeen` | long | The total number of create events that this connector has seen since the last start or metrics reset. |
| `TotalNumberOfUpdateEventsSeen` | long | The total number of update events that this connector has seen since the last start or metrics reset. |
| `TotalNumberOfDeleteEventsSeen` | long | The total number of delete events that this connector has seen since the last start or metrics reset. |
| `NumberOfEventsFiltered` | long | The number of events that have been filtered by include/exclude list filtering rules configured on the connector. |
| `CapturedTables` | string[] | The list of tables that are captured by the connector. |
| `QueueTotalCapacity` | int | The length the queue used to pass events between the streamer and the main Kafka Connect loop. |
| `QueueRemainingCapacity` | int | The free capacity of the queue used to pass events between the streamer and the main Kafka Connect loop. |
| `Connected` | boolean | Flag that denotes whether the connector is currently connected to the database server. |
| `MilliSecondsBehindSource` | long | The number of milliseconds between the last change event’s timestamp and the connector processing it. The values will incoporate any differences between the clocks on the machines where the database server and the connector are running. |
| `NumberOfCommittedTransactions` | long | The number of processed transactions that were committed. |
| `SourceEventPosition` | Map<String, String> | The coordinates of the last received event. |
| `LastTransactionId` | string | Transaction identifier of the last processed transaction. |
| `MaxQueueSizeInBytes` | long | The maximum buffer of the queue in bytes. This metric is available if `max.queue.size.in.bytes` is set to a positive long value. |
| `CurrentQueueSizeInBytes` | long | The current volume, in bytes, of records in the queue. |
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
---
title: Overview of CDC logical replication internals
linkTitle: Overview
description: Change Data Capture in YugabyteDB using logical replication.
headcontent: Change Data Capture in YugabyteDB using logical replication
menu:
preview:
parent: explore-cdc-logical-replication
identifier: cdc-log-rep-overview
weight: 10
type: docs
---
Loading