-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pageserver: ingest pre-serialized batches of values #9579
Conversation
5354 tests run: 5132 passed, 0 failed, 222 skipped (full report)Flaky tests (2)Postgres 17
Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
95df95a at 2024-11-01T19:12:09.642Z :recycle: |
241f79d
to
4f13b27
Compare
4f13b27
to
940aa58
Compare
940aa58
to
3659ea3
Compare
This will replace inmemory_layer::SerializedBatch and also be present in the wire format between safekeepers and pageservers.
Little refactor to clean-up for the next commit.
This is a bigger bite than I would have liked, but this stuff is tightly coupled. The main goal of this commit is to use the `SerializedValueBatch` in `InterpretedWalRecord`. This requires that `DatadirModification` maintains a `SerializedValueBatch` and extends it as new WAL records come in (to avoid flushing to disk on every record). In turn, this cascaded into a number of modifications to `DatadirModification` 1. Replace `pending_data_pages` and `pending_zero_data_pages` with `pending_data_batch`. 2. Removal of `pending_zero_data_pages` and its cousin `on_wal_record_end` 3. Rename `pending_bytes` to `pending_metadata_bytes` since this is what it tracks now. 4. Adapting of various utility methods like `len`, `approx_pending_bytes` and `has_dirty_data_pages`. Removal of `pending_zero_data_pages` and the optimisation associated with it ((1) and (2)) deserves more detail. Previously all zero data pages went through `pending_zero_data_pages`. We wrote zero data pages when filling gaps caused by relation extension (case A) and when handling special wal records (case B). If it happened that the same WAL record contained a non zero write for an entry in `pending_zero_data_pages` we skipped the zero write. Case A: We handle relation extensions differently now. Once the batch grows large enough, we identify the gaps and fill the gaps when handling the entire wal record. Hence, the optimisation is not required anymore. Case B: When the handling of a special record needs to zero out a key, it just adds that to the current batch. I inspected the code, and I don't think the optimisation kicked in here.
3659ea3
to
2e3c703
Compare
Later-edit: re-benchmarked everything after I pushed some optimisations.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears fine to me, but the changes to pgdatadir_mapping.rs
and walingest.rs
are worth another look by someone more intimately familiar with this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to address the comments or maybe it doesn't matter as SerializedValueBatch::default
looks like a light operation (zero-sized Vec doesn't allocate extra memory?)
Benchmarked things again after I added some optimisations (results here). I think this is good to merge. Pending an okay from John for the |
Problem
#9524 split the decoding and interpretation step from ingestion.
The output of the first phase is a
wal_decoder::models::InterpretedWalRecord
. Before this patch set that structcontained a list of
Value
instances.We wish to lift the decoding and interpretation step to the safekeeper, but it would be nice if the safekeeper
gave us a batch containing the raw data instead of actual values.
Summary of changes
Main goal here is to make
InterpretedWalRecord
hold a raw buffer which contains pre-serialized Values.For this we do:
SerializedValueBatch
type. This isinmemory_layer::SerializedBatch
with some extra functionality for extension, observing values for shard 0 and tests.inmemory_layer::SerializedBatch
withSerializedValueBatch
DatadirModification
maintain aSerializedValueBatch
.DatadirModification
changesDatadirModification
now maintains aSerializedValueBatch
and extendsit as new WAL records come in (to avoid flushing to disk on every record).
In turn, this cascaded into a number of modifications to
DatadirModification
:pending_data_pages
andpending_zero_data_pages
withpending_data_batch
.pending_zero_data_pages
and its cousinon_wal_record_end
pending_bytes
topending_metadata_bytes
since this is whatit tracks now.
len
,approx_pending_bytes
andhas_dirty_data_pages
.Removal of
pending_zero_data_pages
and the optimisation associatedwith it ((1) and (2)) deserves more detail.
Previously all zero data pages went through
pending_zero_data_pages
.We wrote zero data pages when filling gaps caused by relation extension
(case A) and when handling special wal records (case B). If it happened
that the same WAL record contained a non zero write for an entry in
pending_zero_data_pages
we skipped the zero write.Case A: We handle this differently now. When ingesting the
SerialiezdValueBatch
associated with one PG WAL record, we identify the gaps and fill the them in one go.
Essentially, we move from a per key process (gaps were filled after each new key),
and replace it with a per record process. Hence, the optimisation is not required
anymore.
Case B: When the handling of a special record needs to zero out a key,
it just adds that to the current batch. I inspected the code, and I
don't think the optimisation kicked in here.
Checklist before requesting a review
Checklist before merging