pageserver: lift decoding and interpreting of wal into wal_decoder #9524

VladLazar · 2024-10-25T14:47:57Z

Problem

Decoding and ingestion are still coupled in pageserver::WalIngest.

Summary of changes

A new type is added to wal_decoder::models, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the metadata_record which is an optional special record type to be handled and blocks which stores key, value pairs to be persisted to storage.

This type is produced by
wal_decoder::models::InterpretedWalRecord::from_bytes from a raw PG wal record.

The rest of this commit separates decoding and interpretation of the PG WAL record from its application in WalIngest::ingest_record.

Related: #9335
Epic: #9329

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2024-10-25T18:19:48Z

5354 tests run: 5132 passed, 0 failed, 222 skipped (full report)

Flaky tests (1)

Postgres 14

test_compute_pageserver_connection_stress: release-x86-64

Code coverage* (full report)

functions: 31.5% (7773 of 24686 functions)
lines: 49.0% (61041 of 124649 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
c5b2c78 at 2024-10-31T10:44:37.184Z :recycle:}

A new type is added to `wal_decoder::models`, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the `metadata_record` which is an optional special record type to be handled and `blocks` which stores key, value pairs to be persisted to storage. This type is produced by `wal_decoder::models::InterpretedWalRecord::from_bytes` from a raw PG wal record. The rest of this commit separates decoding and interpretation of the PG WAL record from its application in [`WalIngest::ingest_record`].

erikgrinaker

Flushing an initial high-level review. Regardless of the ingest sharding, this is a nice cleanup.

I don't see any added test coverage in wal_decoder. I assume we have some coverage in walingest.rs, but is it worth moving some of it over to wal_decoder and improving it? Can be deferred until a later PR.

I assume most of the MetadataRecord::from_decoded() code was just a simple move? Anything in particular that needs a closer review?

libs/wal_decoder/src/models.rs

libs/wal_decoder/src/decoder.rs

pageserver/src/walingest.rs

libs/wal_decoder/src/decoder.rs

pageserver/src/walingest.rs

VladLazar · 2024-10-29T17:02:10Z

I don't see any added test coverage in wal_decoder. I assume we have some coverage in walingest.rs, but is it worth moving some of it over to wal_decoder and improving it? Can be deferred until a later PR.

I agree. Unit testing for decoding is lacking, but that was the case before as well. The best coverage we have right now is from the integration pg regress tests. Decoding is a good candidate for fuzzing, but we don't have the right utils in place and I don't wish to sink too much time into it right now. Or has that changed with your WAL generation work?

I assume most of the MetadataRecord::from_decoded() code was just a simple move? Anything in particular that needs a closer review?

Yup, just a move. Nothing comes to mind, this PR is largely just moving things around and adding types. The InterpretedWalRecord::blocks handling in WalIngest::ingest_record deserves most scrutiny IMO.

erikgrinaker

LGTM modulo comments.

The best coverage we have right now is from the integration pg regress tests. Decoding is a good candidate for fuzzing, but we don't have the right utils in place and I don't wish to sink too much time into it right now. Or has that changed with your WAL generation work?

Nah, nothing there that would really help right now.

Since the input comes from Postgres, which we can presumably trust to generally produce well-formed WAL, I guess the main concern is to handle the happy path -- and we get lots of incidental coverage from anything that writes WAL. Fine to punt this for later.

pageserver/src/walingest.rs

libs/wal_decoder/src/models.rs

libs/wal_decoder/src/decoder.rs

This reverts commit 138f3bc.

skyzh

Overall LGTM :)

libs/wal_decoder/src/models.rs

libs/wal_decoder/src/decoder.rs

## Problem #9524 split the decoding and interpretation step from ingestion. The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. Before this patch set that struct contained a list of `Value` instances. We wish to lift the decoding and interpretation step to the safekeeper, but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values. ## Summary of changes Main goal here is to make `InterpretedWalRecord` hold a raw buffer which contains pre-serialized Values. For this we do: 1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some extra functionality for extension, observing values for shard 0 and tests. 2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch` 3. Make `DatadirModification` maintain a `SerializedValueBatch`. ### `DatadirModification` changes `DatadirModification` now maintains a `SerializedValueBatch` and extends it as new WAL records come in (to avoid flushing to disk on every record). In turn, this cascaded into a number of modifications to `DatadirModification`: 1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`. 2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end` 3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now. 4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`. Removal of `pending_zero_data_pages` and the optimisation associated with it ((1) and (2)) deserves more detail. Previously all zero data pages went through `pending_zero_data_pages`. We wrote zero data pages when filling gaps caused by relation extension (case A) and when handling special wal records (case B). If it happened that the same WAL record contained a non zero write for an entry in `pending_zero_data_pages` we skipped the zero write. Case A: We handle this differently now. When ingesting the `SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the them in one go. Essentially, we move from a per key process (gaps were filled after each new key), and replace it with a per record process. Hence, the optimisation is not required anymore. Case B: When the handling of a special record needs to zero out a key, it just adds that to the current batch. I inspected the code, and I don't think the optimisation kicked in here.

VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch 2 times, most recently from 19c9dd1 to 7c4807e Compare October 25, 2024 16:57

VladLazar mentioned this pull request Oct 28, 2024

pageserver: move things around to prepare for decoding logic #9504

Merged

5 tasks

VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch 2 times, most recently from 285e8d2 to 0d7d792 Compare October 28, 2024 16:31

Base automatically changed from vlad/wal-decoder-module to main October 29, 2024 10:00

VladLazar added 2 commits October 29, 2024 11:03

wal_decoder: add doc comments to the important types and methods

47604a7

VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch from 0d7d792 to 47604a7 Compare October 29, 2024 10:03

VladLazar marked this pull request as ready for review October 29, 2024 10:05

VladLazar requested a review from a team as a code owner October 29, 2024 10:05

VladLazar requested review from skyzh, problame and erikgrinaker October 29, 2024 10:05

problame removed their request for review October 29, 2024 12:51

erikgrinaker reviewed Oct 29, 2024

View reviewed changes

erikgrinaker approved these changes Oct 30, 2024

View reviewed changes

pageserver/src/walingest.rs Show resolved Hide resolved

libs/wal_decoder/src/models.rs Show resolved Hide resolved

libs/wal_decoder/src/decoder.rs Outdated Show resolved Hide resolved

VladLazar added 8 commits October 30, 2024 11:23

review: wax lirically in flush_uncommitted comment

796324b

review: rename from_byte to from_bytes_filtered

4b7c641

review: bring walingest comments up to date

f8f5c4e

review: assert on image size in put_rel_page_image

138f3bc

review: remove misleading todo comment in wal_decoder

3839a19

review: log on unexpected resource manager ids

90b301f

ci: fixup doc gen

6013dba

Revert "review: assert on image size in put_rel_page_image"

385e2d8

This reverts commit 138f3bc.

skyzh approved these changes Oct 30, 2024

View reviewed changes

libs/wal_decoder/src/models.rs Outdated Show resolved Hide resolved

libs/wal_decoder/src/decoder.rs Show resolved Hide resolved

VladLazar added 2 commits October 30, 2024 19:02

review: correct InterpretedWalRecord::lsn comment

50b7d3f

review: fail decoding on unsupported keys

b12c42c

ci: punt on unknown record types warning

c5b2c78

VladLazar mentioned this pull request Oct 31, 2024

pageserver: ingest pre-serialized batches of values #9579

Merged

5 tasks

VladLazar merged commit 411c3aa into main Oct 31, 2024
81 checks passed

VladLazar deleted the vlad/move-decoding-logic-to-decoder-module branch October 31, 2024 10:47

VladLazar mentioned this pull request Nov 25, 2024

pageserver/walingest: clean up post-interpretation unwraps in walingest.rs #9491

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: lift decoding and interpreting of wal into wal_decoder #9524

pageserver: lift decoding and interpreting of wal into wal_decoder #9524

VladLazar commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

Postgres 14

erikgrinaker left a comment

VladLazar commented Oct 29, 2024

erikgrinaker left a comment •

edited

Loading

skyzh left a comment

pageserver: lift decoding and interpreting of wal into wal_decoder #9524

pageserver: lift decoding and interpreting of wal into wal_decoder #9524

Conversation

VladLazar commented Oct 25, 2024 • edited Loading

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Oct 25, 2024 • edited Loading

5354 tests run: 5132 passed, 0 failed, 222 skipped (full report)

Postgres 14

Code coverage* (full report)

erikgrinaker left a comment

Choose a reason for hiding this comment

VladLazar commented Oct 29, 2024

erikgrinaker left a comment • edited Loading

Choose a reason for hiding this comment

skyzh left a comment

Choose a reason for hiding this comment

VladLazar commented Oct 25, 2024 •

edited

Loading

github-actions bot commented Oct 25, 2024 •

edited

Loading

erikgrinaker left a comment •

edited

Loading