Skip to content

Commit

Permalink
Remove at most one retain_lsn entry from (possibly offloaded) timelne…
Browse files Browse the repository at this point in the history
…'s parent (#9791)

There is a potential data corruption issue, not one I've encountered,
but it's still not hard to hit with some correct looking code given our
current architecture. It has to do with the timeline's memory object storage
via reference counted `Arc`s, and the removal of `retain_lsn` entries at
the drop of the last `Arc` reference.

The corruption steps are as follows:

1. timeline gets offloaded. timeline object A doesn't get dropped
though, because some long-running task accesses it
2. the same timeline gets unoffloaded again. timeline object B gets
created for it, timeline object A still referenced. both point to the
same timeline.
3. the task keeping the reference to timeline object A exits. destructor
for object A runs, removing `retain_lsn` in the timeline's parent.
4. the timeline's parent runs gc without the `retain_lsn` of the still
exant timleine's child, leading to data corruption.

In general we are susceptible each time when we recreate a `Timeline`
object in the same process, which happens both during a timeline
offload/unoffload cycle, as well as during an ancestor detach operation.

The solution this PR implements is to make the destructor for a timeline
as well as an offloaded timeline remove at most one `retain_lsn`.

PR #9760 has added a log line to print the refcounts at timeline
offload, but this only detects one of the places where we do such a
recycle operation. Plus it doesn't prevent the actual issue.

I doubt that this occurs in practice. It is more a defense in depth measure.
Usually I'd assume that the timeline gets dropped immediately in step 1,
as there is no background tasks referencing it after its shutdown.
But one never knows, and reducing the stakes of step 1 actually occurring
is a really good idea, from potential data corruption to waste of CPU time.

Part of #8088
  • Loading branch information
arpad-m authored Nov 18, 2024
1 parent d7662fd commit 4fc3af1
Show file tree
Hide file tree
Showing 2 changed files with 26 additions and 9 deletions.
6 changes: 5 additions & 1 deletion pageserver/src/tenant.rs
Original file line number Diff line number Diff line change
Expand Up @@ -608,11 +608,15 @@ impl OffloadedTimeline {
.iter()
.find(|(tid, _tl)| **tid == ancestor_timeline_id)
{
ancestor_timeline
let removal_happened = ancestor_timeline
.gc_info
.write()
.unwrap()
.remove_child_offloaded(self.timeline_id);
if !removal_happened {
tracing::error!(tenant_id = %self.tenant_shard_id.tenant_id, shard_id = %self.tenant_shard_id.shard_slug(), timeline_id = %self.timeline_id,
"Couldn't remove retain_lsn entry from offloaded timeline's parent: already removed");
}
}
}
self.deleted_from_ancestor.store(true, Ordering::Release);
Expand Down
29 changes: 21 additions & 8 deletions pageserver/src/tenant/timeline.rs
Original file line number Diff line number Diff line change
Expand Up @@ -481,17 +481,27 @@ impl GcInfo {
&mut self,
child_id: TimelineId,
maybe_offloaded: MaybeOffloaded,
) {
self.retain_lsns
.retain(|i| !(i.1 == child_id && i.2 == maybe_offloaded));
) -> bool {
// Remove at most one element. Needed for correctness if there is two live `Timeline` objects referencing
// the same timeline. Shouldn't but maybe can occur when Arc's live longer than intended.
let mut removed = false;
self.retain_lsns.retain(|i| {
if removed {
return true;
}
let remove = i.1 == child_id && i.2 == maybe_offloaded;
removed |= remove;
!remove
});
removed
}

pub(super) fn remove_child_not_offloaded(&mut self, child_id: TimelineId) {
self.remove_child_maybe_offloaded(child_id, MaybeOffloaded::No);
pub(super) fn remove_child_not_offloaded(&mut self, child_id: TimelineId) -> bool {
self.remove_child_maybe_offloaded(child_id, MaybeOffloaded::No)
}

pub(super) fn remove_child_offloaded(&mut self, child_id: TimelineId) {
self.remove_child_maybe_offloaded(child_id, MaybeOffloaded::Yes);
pub(super) fn remove_child_offloaded(&mut self, child_id: TimelineId) -> bool {
self.remove_child_maybe_offloaded(child_id, MaybeOffloaded::Yes)
}
}

Expand Down Expand Up @@ -4514,7 +4524,10 @@ impl Drop for Timeline {
// This lock should never be poisoned, but in case it is we do a .map() instead of
// an unwrap(), to avoid panicking in a destructor and thereby aborting the process.
if let Ok(mut gc_info) = ancestor.gc_info.write() {
gc_info.remove_child_not_offloaded(self.timeline_id)
if !gc_info.remove_child_not_offloaded(self.timeline_id) {
tracing::error!(tenant_id = %self.tenant_shard_id.tenant_id, shard_id = %self.tenant_shard_id.shard_slug(), timeline_id = %self.timeline_id,
"Couldn't remove retain_lsn entry from offloaded timeline's parent: already removed");
}
}
}
}
Expand Down

1 comment on commit 4fc3af1

@github-actions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

5499 tests run: 5272 passed, 1 failed, 226 skipped (full report)


Failures on Postgres 17

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_crafted_wal_end[debug-pg17-simple]"

Test coverage report is not available

The comment gets automatically updated with the latest test results
4fc3af1 at 2024-11-18T21:35:15.946Z :recycle:

Please sign in to comment.