Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: lift decoding and interpreting of wal into wal_decoder #9524

Merged
merged 13 commits into from
Oct 31, 2024

Conversation

VladLazar
Copy link
Contributor

@VladLazar VladLazar commented Oct 25, 2024

Problem

Decoding and ingestion are still coupled in pageserver::WalIngest.

Summary of changes

A new type is added to wal_decoder::models, InterpretedWalRecord. This type contains everything that the pageserver requires in order to ingest a WAL record. The highlights are the metadata_record which is an optional special record type to be handled and blocks which stores key, value pairs to be persisted to storage.

This type is produced by
wal_decoder::models::InterpretedWalRecord::from_bytes from a raw PG wal record.

The rest of this commit separates decoding and interpretation of the PG WAL record from its application in WalIngest::ingest_record.

Related: #9335
Epic: #9329

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@VladLazar VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch 2 times, most recently from 19c9dd1 to 7c4807e Compare October 25, 2024 16:57
Copy link

github-actions bot commented Oct 25, 2024

5354 tests run: 5132 passed, 0 failed, 222 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 31.5% (7773 of 24686 functions)
  • lines: 49.0% (61041 of 124649 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
c5b2c78 at 2024-10-31T10:44:37.184Z :recycle:

@VladLazar VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch 2 times, most recently from 285e8d2 to 0d7d792 Compare October 28, 2024 16:31
Base automatically changed from vlad/wal-decoder-module to main October 29, 2024 10:00
A new type is added to `wal_decoder::models`, InterpretedWalRecord.
This type contains everything that the pageserver requires in order
to ingest a WAL record. The highlights are the `metadata_record`
which is an optional special record type to be handled and `blocks`
which stores key, value pairs to be persisted to storage.

This type is produced by
`wal_decoder::models::InterpretedWalRecord::from_bytes` from a
raw PG wal record.

The rest of this commit separates decoding and interpretation of the
PG WAL record from its application in [`WalIngest::ingest_record`].
@VladLazar VladLazar force-pushed the vlad/move-decoding-logic-to-decoder-module branch from 0d7d792 to 47604a7 Compare October 29, 2024 10:03
@VladLazar VladLazar marked this pull request as ready for review October 29, 2024 10:05
@VladLazar VladLazar requested a review from a team as a code owner October 29, 2024 10:05
@problame problame removed their request for review October 29, 2024 12:51
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flushing an initial high-level review. Regardless of the ingest sharding, this is a nice cleanup.

I don't see any added test coverage in wal_decoder. I assume we have some coverage in walingest.rs, but is it worth moving some of it over to wal_decoder and improving it? Can be deferred until a later PR.

I assume most of the MetadataRecord::from_decoded() code was just a simple move? Anything in particular that needs a closer review?

libs/wal_decoder/src/models.rs Show resolved Hide resolved
libs/wal_decoder/src/models.rs Outdated Show resolved Hide resolved
libs/wal_decoder/src/decoder.rs Outdated Show resolved Hide resolved
pageserver/src/walingest.rs Show resolved Hide resolved
libs/wal_decoder/src/decoder.rs Outdated Show resolved Hide resolved
pageserver/src/walingest.rs Outdated Show resolved Hide resolved
pageserver/src/walingest.rs Show resolved Hide resolved
@VladLazar
Copy link
Contributor Author

I don't see any added test coverage in wal_decoder. I assume we have some coverage in walingest.rs, but is it worth moving some of it over to wal_decoder and improving it? Can be deferred until a later PR.

I agree. Unit testing for decoding is lacking, but that was the case before as well. The best coverage we have right now is from the integration pg regress tests. Decoding is a good candidate for fuzzing, but we don't have the right utils in place and I don't wish to sink too much time into it right now. Or has that changed with your WAL generation work?

I assume most of the MetadataRecord::from_decoded() code was just a simple move? Anything in particular that needs a closer review?

Yup, just a move. Nothing comes to mind, this PR is largely just moving things around and adding types. The InterpretedWalRecord::blocks handling in WalIngest::ingest_record deserves most scrutiny IMO.

Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo comments.

The best coverage we have right now is from the integration pg regress tests. Decoding is a good candidate for fuzzing, but we don't have the right utils in place and I don't wish to sink too much time into it right now. Or has that changed with your WAL generation work?

Nah, nothing there that would really help right now.

Since the input comes from Postgres, which we can presumably trust to generally produce well-formed WAL, I guess the main concern is to handle the happy path -- and we get lots of incidental coverage from anything that writes WAL. Fine to punt this for later.

pageserver/src/walingest.rs Show resolved Hide resolved
libs/wal_decoder/src/models.rs Show resolved Hide resolved
libs/wal_decoder/src/decoder.rs Outdated Show resolved Hide resolved
Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM :)

libs/wal_decoder/src/models.rs Outdated Show resolved Hide resolved
libs/wal_decoder/src/decoder.rs Show resolved Hide resolved
@VladLazar VladLazar merged commit 411c3aa into main Oct 31, 2024
81 checks passed
@VladLazar VladLazar deleted the vlad/move-decoding-logic-to-decoder-module branch October 31, 2024 10:47
VladLazar added a commit that referenced this pull request Nov 6, 2024
## Problem

#9524 split the decoding and
interpretation step from ingestion.
The output of the first phase is a `wal_decoder::models::InterpretedWalRecord`. 
Before this patch set that struct contained a list of `Value` instances.

We wish to lift the decoding and interpretation step to the safekeeper,
but it would be nice if the safekeeper gave us a batch containing the raw data instead of actual values.

## Summary of changes

Main goal here is to make `InterpretedWalRecord` hold a raw buffer which
contains pre-serialized Values.
For this we do:
1. Add a `SerializedValueBatch` type. This is `inmemory_layer::SerializedBatch` with some 
extra functionality for extension, observing values for shard 0 and tests.
2. Replace `inmemory_layer::SerializedBatch` with `SerializedValueBatch`
3. Make `DatadirModification` maintain a `SerializedValueBatch`.


### `DatadirModification` changes

`DatadirModification` now maintains a `SerializedValueBatch` and extends
it as new WAL records come in (to avoid flushing to disk on every
record).
In turn, this cascaded into a number of modifications to
`DatadirModification`:
1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`.
2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end`
3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now.
4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`.

Removal of `pending_zero_data_pages` and the optimisation associated
with it ((1) and (2)) deserves more detail.

Previously all zero data pages went through `pending_zero_data_pages`.
We wrote zero data pages when filling gaps caused by relation extension
(case A) and when handling special wal records (case B). If it happened
that the same WAL record contained a non zero write for an entry in
`pending_zero_data_pages` we skipped the zero write.

Case A: We handle this differently now. When ingesting the
`SerialiezdValueBatch` associated with one PG WAL record, we identify the gaps and fill the
them in one go. Essentially, we move from a per key process (gaps were filled after each
new key), and replace it with a per record process. Hence, the optimisation is not
required anymore.

Case B: When the handling of a special record needs to zero out a key,
it just adds that to the current batch. I inspected the code, and I
don't think the optimisation kicked in here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants