Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: enable compaction to proceed while live-migrating #5397

Merged
merged 5 commits into from
Nov 20, 2024

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Sep 27, 2023

Problem

Long ago, in #5299 the tenant states for migration are added, but respected only in a coarse-grained way: when hinted not to do deletions, tenants will just avoid doing all GC or compaction.

Skipping compaction is not necessary for AttachedMulti, as we will soon become the primary attached location, and it is not a waste of resources to proceed with compaction. Instead, per the RFC
https://github.com/neondatabase/neon/pull/5029/files), deletions should be queued up in this state, and executed later when we switch to AttachedSingle.

Avoiding compaction in AttachedMulti can have an operational impact if a tenant is under significant write load, as a long-running migration can result in a large accumulation of delta layers with commensurate impact on read latency.

Closes: #5396

Summary of changes

  • Add a 'config' part to RemoteTimelineClient so that it can be aware of the mode of the tenant it belongs to, and wire this through for construction + updates
  • Add a special buffer for delayed deletions, and when in AttachedMulti route deletions here instead of into the main remote client queue. This is drained when transitioning to AttachedSingle. If the tenant is detached or our process dies before then, then these objects are leaked.
  • As a quality of life improvement, also use the remote timeline client's knowledge of the tenant state to avoid submitting remote consistent LSN updates for validation when in AttachedStale (as we know these will fail)

Checklist before requesting a review

  • I have performed a self-review of my code.
  • If it is a core feature, I have added thorough tests.
  • Do we need to implement analytics? if so did you add the relevant metrics to the dashboard?
  • If this PR requires public announcement, mark it with /release-notes label and add several sentences in this section.

Checklist before merging

  • Do not forget to reformat commit message to not include the above checklist

@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Sep 27, 2023
@github-actions
Copy link

github-actions bot commented Sep 27, 2023

5535 tests run: 5309 passed, 0 failed, 226 skipped (full report)


Flaky tests (1)

Postgres 14

Code coverage* (full report)

  • functions: 31.4% (7940 of 25306 functions)
  • lines: 49.3% (63030 of 127869 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
9f1baf6 at 2024-11-20T17:41:18.371Z :recycle:

jcsp added a commit that referenced this pull request Oct 5, 2023
…ocations (#5299)

## Problem

These changes are part of building seamless tenant migration, as
described in the RFC:
- #5029

## Summary of changes

- A new configuration type `LocationConf` supersedes `TenantConfOpt` for
storing a tenant's configuration in the pageserver repo dir. It contains
`TenantConfOpt`, as well as a new `mode` attribute that describes what
kind of location this is (secondary, attached, attachment mode etc). It
is written to a file called `config-v1` instead of `config` -- this
prepares us for neatly making any other profound changes to the format
of the file in future. Forward compat for existing pageserver code is
achieved by writing out both old and new style files. Backward compat is
achieved by checking for the old-style file if the new one isn't found.
- The `TenantMap` type changes, to hold `TenantSlot` instead of just
`Tenant`. The `Tenant` type continues to be used for attached tenants
only. Tenants in other states (such as secondaries) are represented by a
different variant of `TenantSlot`.
- Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>,
they now hold a reference to a AttachedTenantConf, which includes the
extra information from LocationConf. This enables them to know the
current attachment mode.
- The attachment mode is used as an advisory input to decide whether to
do compaction and GC (AttachedStale is meant to avoid doing uploads,
AttachedMulti is meant to avoid doing deletions).
- A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config`
to drive new location configuration. This provides a superset of the
functionality of attach/detach/load/ignore:
  - Attaching a tenant is just configuring it in an attached state
  - Detaching a tenant is configuring it to a detached state
  - Loading a tenant is just the same as attaching it
- Ignoring a tenant is the same as configuring it into Secondary with
warm=false (i.e. retain the files on disk but do nothing else).

Caveats:
- AttachedMulti tenants don't do compaction in this PR, but they do in
the follow on #5397
- Concurrent updates to the `location_config` API are not handled
elegantly in this PR, a better mechanism is added in the follow on
#5367
- Secondary mode is just a placeholder in this PR: the code to upload
heatmaps and do downloads on secondary locations will be added in a
later PR (but that shouldn't change any external interfaces)

Closes: #5379

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
jcsp added a commit that referenced this pull request Oct 10, 2023
## Problem

If a caller detaches a tenant and then attaches it again, pending
deletions from the old attachment might not have happened yet. This is
not a correctness problem, but it causes:
- Risk of leaking some objects in S3
- Some warnings from the deletion queue when pending LSN updates and
pending deletions don't pass validation.

## Summary of changes

- Deletion queue now uses UnboundedChannel so that the push interfaces
don't have to be async.
- This was pulled out of #5397,
where it is also useful to be able to drive the queue from non-async
contexts.
- Why is it okay for this to be unbounded? The only way the
unbounded-ness of the channel can become a problem is if writing out
deletion lists can't keep up, but if the system were that overloaded
then the code generating deletions (GC, compaction) would also be
impacted.
- DeletionQueueClient gets a new `flush_advisory` function, which is
like flush_execute, but doesn't wait for completion: this is appropriate
for use in contexts where we would like to encourage the deletion queue
to flush, but don't need to block on it.
- This function is also expected to be useful in next steps for seamless
migration, where the option to flush to S3 while transitioning into
AttachedStale will also include flushing deletion queue, but we wouldn't
want to block on that flush.
- The tenant_detach code in mgr.rs invokes flush_advisory after stopping
the `Tenant` object.

---------

Co-authored-by: Arpad Müller <arpad-m@users.noreply.github.com>
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 87d7bdc to 0b1a2a7 Compare November 2, 2023 14:01
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 0b1a2a7 to ecc0368 Compare December 11, 2023 10:10
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from ecc0368 to 801d15f Compare June 3, 2024 16:51
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt1.5 branch from 801d15f to c8d1313 Compare October 31, 2024 20:49
@jcsp jcsp requested a review from skyzh October 31, 2024 20:55
@jcsp jcsp self-assigned this Nov 4, 2024
@jcsp jcsp marked this pull request as ready for review November 13, 2024 16:32
@jcsp jcsp requested a review from a team as a code owner November 13, 2024 16:32
@jcsp jcsp changed the title pageserver: finer-grained handling of migration states pageserver: enable compaction to proceed while live-migrating Nov 13, 2024
Copy link
Member

@skyzh skyzh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally LGTM, but I can see there are potentially some races (in reality, very rare to happen):

(1) Thread 1 calling update_config (remote_timeline_client.rs L495), it move all blocked deletions to the queue, and is about to update self.config to the new value.

(2) Thread 2 obtained the queue lock and looked at the deletion operation in the queue right before thread 1 update self.config (remote_timeline_client.rs L1963). At this point, self.config still has block_deletions=true, and everything gets moved back to the blocked deletion buffer.

But (2) might take a relatively long time and would not finish within the short amount of the time when (1) is right about to update the config.

Anyways, I don't see such races would lead to very bad effects except files don't get deleted immediately, so it's fine to leave it there for now.

@jcsp
Copy link
Collaborator Author

jcsp commented Nov 20, 2024

Generally LGTM, but I can see there are potentially some races (in reality, very rare to happen):

This is a good point - I have tweaked the code to update remotetimeline conf before rather than after draining the queue

@jcsp jcsp added this pull request to the merge queue Nov 20, 2024
@jcsp jcsp removed this pull request from the merge queue due to a manual request Nov 20, 2024
@jcsp jcsp enabled auto-merge November 20, 2024 15:55
@jcsp jcsp added this pull request to the merge queue Nov 20, 2024
Merged via the queue into main with commit 5ff2f1e Nov 20, 2024
74 checks passed
@jcsp jcsp deleted the jcsp/secondary-locations-pt1.5 branch November 20, 2024 17:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: additional live migration regression tests
2 participants