pageserver: enable compaction to proceed while live-migrating #5397

jcsp · 2023-09-27T10:17:07Z

Problem

Long ago, in #5299 the tenant states for migration are added, but respected only in a coarse-grained way: when hinted not to do deletions, tenants will just avoid doing all GC or compaction.

Skipping compaction is not necessary for AttachedMulti, as we will soon become the primary attached location, and it is not a waste of resources to proceed with compaction. Instead, per the RFC
https://github.com/neondatabase/neon/pull/5029/files), deletions should be queued up in this state, and executed later when we switch to AttachedSingle.

Avoiding compaction in AttachedMulti can have an operational impact if a tenant is under significant write load, as a long-running migration can result in a large accumulation of delta layers with commensurate impact on read latency.

Closes: #5396

Summary of changes

Add a 'config' part to RemoteTimelineClient so that it can be aware of the mode of the tenant it belongs to, and wire this through for construction + updates
Add a special buffer for delayed deletions, and when in AttachedMulti route deletions here instead of into the main remote client queue. This is drained when transitioning to AttachedSingle. If the tenant is detached or our process dies before then, then these objects are leaked.
As a quality of life improvement, also use the remote timeline client's knowledge of the tenant state to avoid submitting remote consistent LSN updates for validation when in AttachedStale (as we know these will fail)

Checklist before requesting a review

I have performed a self-review of my code.
If it is a core feature, I have added thorough tests.
Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

Do not forget to reformat commit message to not include the above checklist

github-actions · 2023-09-27T10:48:07Z

5535 tests run: 5309 passed, 0 failed, 226 skipped (full report)

Flaky tests (1)

Postgres 14

test_pull_timeline[True]: release-arm64

Code coverage* (full report)

functions: 31.4% (7940 of 25306 functions)
lines: 49.3% (63030 of 127869 lines)

* collected from Rust tests only

_{The comment gets automatically updated with the latest test results
9f1baf6 at 2024-11-20T17:41:18.371Z :recycle:}

…ocations (#5299) ## Problem These changes are part of building seamless tenant migration, as described in the RFC: - #5029 ## Summary of changes - A new configuration type `LocationConf` supersedes `TenantConfOpt` for storing a tenant's configuration in the pageserver repo dir. It contains `TenantConfOpt`, as well as a new `mode` attribute that describes what kind of location this is (secondary, attached, attachment mode etc). It is written to a file called `config-v1` instead of `config` -- this prepares us for neatly making any other profound changes to the format of the file in future. Forward compat for existing pageserver code is achieved by writing out both old and new style files. Backward compat is achieved by checking for the old-style file if the new one isn't found. - The `TenantMap` type changes, to hold `TenantSlot` instead of just `Tenant`. The `Tenant` type continues to be used for attached tenants only. Tenants in other states (such as secondaries) are represented by a different variant of `TenantSlot`. - Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>, they now hold a reference to a AttachedTenantConf, which includes the extra information from LocationConf. This enables them to know the current attachment mode. - The attachment mode is used as an advisory input to decide whether to do compaction and GC (AttachedStale is meant to avoid doing uploads, AttachedMulti is meant to avoid doing deletions). - A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config` to drive new location configuration. This provides a superset of the functionality of attach/detach/load/ignore: - Attaching a tenant is just configuring it in an attached state - Detaching a tenant is configuring it to a detached state - Loading a tenant is just the same as attaching it - Ignoring a tenant is the same as configuring it into Secondary with warm=false (i.e. retain the files on disk but do nothing else). Caveats: - AttachedMulti tenants don't do compaction in this PR, but they do in the follow on #5397 - Concurrent updates to the `location_config` API are not handled elegantly in this PR, a better mechanism is added in the follow on #5367 - Secondary mode is just a placeholder in this PR: the code to upload heatmaps and do downloads on secondary locations will be added in a later PR (but that shouldn't change any external interfaces) Closes: #5379 --------- Co-authored-by: Christian Schwarz <christian@neon.tech>

## Problem If a caller detaches a tenant and then attaches it again, pending deletions from the old attachment might not have happened yet. This is not a correctness problem, but it causes: - Risk of leaking some objects in S3 - Some warnings from the deletion queue when pending LSN updates and pending deletions don't pass validation. ## Summary of changes - Deletion queue now uses UnboundedChannel so that the push interfaces don't have to be async. - This was pulled out of #5397, where it is also useful to be able to drive the queue from non-async contexts. - Why is it okay for this to be unbounded? The only way the unbounded-ness of the channel can become a problem is if writing out deletion lists can't keep up, but if the system were that overloaded then the code generating deletions (GC, compaction) would also be impacted. - DeletionQueueClient gets a new `flush_advisory` function, which is like flush_execute, but doesn't wait for completion: this is appropriate for use in contexts where we would like to encourage the deletion queue to flush, but don't need to block on it. - This function is also expected to be useful in next steps for seamless migration, where the option to flush to S3 while transitioning into AttachedStale will also include flushing deletion queue, but we wouldn't want to block on that flush. - The tenant_detach code in mgr.rs invokes flush_advisory after stopping the `Tenant` object. --------- Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>

skyzh

Generally LGTM, but I can see there are potentially some races (in reality, very rare to happen):

(1) Thread 1 calling update_config (remote_timeline_client.rs L495), it move all blocked deletions to the queue, and is about to update self.config to the new value.

(2) Thread 2 obtained the queue lock and looked at the deletion operation in the queue right before thread 1 update self.config (remote_timeline_client.rs L1963). At this point, self.config still has block_deletions=true, and everything gets moved back to the blocked deletion buffer.

But (2) might take a relatively long time and would not finish within the short amount of the time when (1) is right about to update the config.

Anyways, I don't see such races would lead to very bad effects except files don't get deleted immediately, so it's fine to leave it there for now.

jcsp · 2024-11-20T15:55:00Z

Generally LGTM, but I can see there are potentially some races (in reality, very rare to happen):

This is a good point - I have tweaked the code to update remotetimeline conf before rather than after draining the queue

jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Sep 27, 2023

jcsp mentioned this pull request Sep 27, 2023

pageserver: location configuration API, attachment modes, secondary locations #5299

Merged

5 tasks

jcsp mentioned this pull request Oct 3, 2023

pageserver: flush deletion queue on detach #5452

Merged

jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 87d7bdc to 0b1a2a7 Compare November 2, 2023 14:01

jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 0b1a2a7 to ecc0368 Compare December 11, 2023 10:10

jcsp mentioned this pull request Apr 2, 2024

pageserver: additional live migration regression tests #5396

Closed

6 tasks

jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from ecc0368 to 801d15f Compare June 3, 2024 16:51

pageserver: implement cutover behaviors in RemoteTimelineclient

c8d1313

jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 801d15f to c8d1313 Compare October 31, 2024 20:49

jcsp requested a review from skyzh October 31, 2024 20:55

jcsp self-assigned this Nov 4, 2024

Cover AttachedMulti behavior in test_live_migration

ac78b03

jcsp marked this pull request as ready for review November 13, 2024 16:32

jcsp requested a review from a team as a code owner November 13, 2024 16:32

jcsp changed the title ~~pageserver: finer-grained handling of migration states~~ pageserver: enable compaction to proceed while live-migrating Nov 13, 2024

Fix conf updates

8509644

skyzh approved these changes Nov 14, 2024

View reviewed changes

Tweak RemoteTimelineclient::update_config

92b6b15

jcsp added this pull request to the merge queue Nov 20, 2024

jcsp removed this pull request from the merge queue due to a manual request Nov 20, 2024

jcsp enabled auto-merge November 20, 2024 15:55

lint

9f1baf6

jcsp added this pull request to the merge queue Nov 20, 2024

Merged via the queue into main with commit 5ff2f1e Nov 20, 2024
74 checks passed

jcsp deleted the jcsp/secondary-locations-pt1.5 branch November 20, 2024 17:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pageserver: enable compaction to proceed while live-migrating #5397

pageserver: enable compaction to proceed while live-migrating #5397

jcsp commented Sep 27, 2023 •

edited

Loading

github-actions bot commented Sep 27, 2023 •

edited

Loading

Postgres 14

skyzh left a comment •

edited

Loading

jcsp commented Nov 20, 2024

pageserver: enable compaction to proceed while live-migrating #5397

pageserver: enable compaction to proceed while live-migrating #5397

Conversation

jcsp commented Sep 27, 2023 • edited Loading

Problem

Summary of changes

Checklist before requesting a review

Checklist before merging

github-actions bot commented Sep 27, 2023 • edited Loading

5535 tests run: 5309 passed, 0 failed, 226 skipped (full report)

Postgres 14

Code coverage* (full report)

skyzh left a comment • edited Loading

Choose a reason for hiding this comment

jcsp commented Nov 20, 2024

jcsp commented Sep 27, 2023 •

edited

Loading

github-actions bot commented Sep 27, 2023 •

edited

Loading

skyzh left a comment •

edited

Loading