-
Notifications
You must be signed in to change notification settings - Fork 476
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: stop storing TimelineMetadata in index_part.json as bytes #7699
Conversation
3198 tests run: 3056 passed, 0 failed, 142 skipped (full report)Code coverage* (full report)
* collected from Rust tests only The comment gets automatically updated with the latest test results
1891124 at 2024-06-11T12:35:02.476Z :recycle: |
This comment was marked as outdated.
This comment was marked as outdated.
6970bde
to
906ee84
Compare
…lease (#7693) ## Problem Currently we serialize the `TimelineMetadata` into bytes to put it into `index_part.json`. This `Vec<u8>` (hopefully `[u8; 512]`) representation was chosen because of problems serializing TimelineId and Lsn between different serializers (bincode, json). After #5335, the serialization of those types became serialization format aware or format agnostic. We've removed the pageserver local `metadata` file writing in #6769. ## Summary of changes Allow switching from the current serialization format to plain JSON for the legacy TimelineMetadata format in the future by adding a competitive serialization method to the current one (`crate::tenant::metadata::modern_serde`), which accepts both old bytes and new plain JSON. The benefits of this are that dumping the index_part.json with pretty printing no longer produces more than 500 lines of output, but after enabling it produces lines only proportional to the layer count, like: ```json { "version": ???, "layer_metadata": { ... }, "disk_consistent_lsn": "0/15FD5D8", "legacy_metadata": { "disk_consistent_lsn": "0/15FD5D8", "prev_record_lsn": "0/15FD5A0", "ancestor_timeline": null, "ancestor_lsn": "0/0", "latest_gc_cutoff_lsn": "0/149FD18", "initdb_lsn": "0/149FD18", "pg_version": 15 } } ``` In the future, I propose we completely stop using this legacy metadata type and wasting time trying to come up with another version numbering scheme in addition to the informative-only one already found in `index_part.json`, and go ahead with storing metadata or feature flags on the `index_part.json` itself. #7699 is the "one release after" changes which starts to produce metadata in the index_part.json as json.
48174cd
to
6cf038f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The metadata.rs
is hard to read for me, maybe because I didn't review the first PR, but, really, I think it's because you tried hard to not duplicate the field definitions.
I wonder if we could rename-on-read the field to just metadata
and allow soft-deserialize-style format evolution for the JSON value.
The price would be to duplicate the fields initially. New fields would only go into the serde_current_json's TimelineMetadata.
Like so:
mod metadata {
enum TimelineMetadata {
disk_consistent_lsn: Lsn,
prev_record_lsn: Option<Lsn>,
ancestor_timeline: Option<TimelineId>,
ancestor_lsn: Lsn,
latest_gc_cutoff_lsn: Lsn,
initdb_lsn: Lsn,
pg_version: u32,
}
impl Deserialize for TimelineMetadata {
// if it's a serde seq then
// - deser to Vec<u8>
// - decode using serde_legacy_bincode::TimelineMetadata::from_bytes
// if it's a serde map then
// - deser a serde_current_json::TimelineMetadata
// - move fields into Self
}
impl Serialize for TimelineMetadata {
// always ser using serde_current_json::TimelineMetadata::serialize
}
mod serde_legacy_bincode {
//! DO NOT CHANGE, HERE ONLY FOR BACKWARDS-COMPATIBILITY.
//! This format was abandoned because it does not support format evolution and is painful to read by humans.
#[derive(Deserialize, Serialize)]
struct TimelineMetadata {
// DO NOT CHANGE, SEE MODULE COMMENT
disk_consistent_lsn: Lsn,
prev_record_lsn: Option<Lsn>,
ancestor_timeline: Option<TimelineId>,
ancestor_lsn: Lsn,
latest_gc_cutoff_lsn: Lsn,
initdb_lsn: Lsn,
pg_version: u32,
// DO NOT CHANGE, SEE MODULE COMMENT
}
// the bincode impl
}
mod serde_current_json {
//! The format that we're currently writing. Supports format evolution through soft deserialization.
// See note at bottom of struct for how to add new fields.
#[derive(Deserialize, Serialize)]
struct TimelineMetadata {
disk_consistent_lsn: Lsn,
prev_record_lsn: Option<Lsn>,
ancestor_timeline: Option<TimelineId>,
ancestor_lsn: Lsn,
latest_gc_cutoff_lsn: Lsn,
initdb_lsn: Lsn,
pg_version: u32,
// When adding a field here, ensure backwards-compatibility and add my_new_field_update_on_read code.
// And remember to bump `LATEST_VERSION` in IndexPart.
}
}
}
no one should be using them anyways. remove the extra test and duplicate wrapping from the test I had missed before.
exposed via TimelineMetadata::with_recalculated_checksum
6cf038f
to
0b750d8
Compare
so we are stuck with metadata_bytes for backwards compat, until next week.
We had a call about this. We can. I didn't think of this while being too focused on declaring I also linked your proposal comment to the PR description to be included with the commit message. |
We've stored metadata as bytes within the
index_part.json
for since long fixed reasons. #7693 added support for reading out normal json serialization of theTimelineMetadata
.Change the serialization to only write
TimelineMetadata
as json for going forward, keeping the backwards compatibility to reading the metadata as bytes. Because of failure to includealias = "metadata"
in #7693, one more follow-up is required to make the switch from the old name to"metadata": <json>
, but that affects only the field name in serialized format.In documentation and naming, an effort is made to add enough warning signs around TimelineMetadata so that it will receive no changes in the future. We can just add those fields to
IndexPart
directly instead. Changing TimelineMetadata now carries all of the backward compatibility risks without offering any benefits over changingIndexPart
, where we do have a clear strategy (soft json (de)serialization).Additionally the path to cleaning up
metadata.rs
is documented in themetadata.rs
module comment. If we must extendTimelineMetadata
before that, the duplication suggested in review comment is the way to go.