Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pageserver: add InProgress tenant map state, use a sync lock for the map #5367

Merged
merged 23 commits into from
Nov 6, 2023

Conversation

jcsp
Copy link
Collaborator

@jcsp jcsp commented Sep 25, 2023

Problem

Follows on from #5299

  • We didn't have a generic way to protect a tenant undergoing changes: Tenant had states, but for our arbitrary transitions between secondary/attached, we need a general way to say "reserve this tenant ID, and don't allow any other ops on it, but don't try and report it as being in any particular state".
  • The TenantsMap structure was behind an async RwLock, but it was never correct to hold it across await points: that would block any other changes for all tenants.

Summary of changes

  • Add the TenantSlot::InProgress value. This means:
    • Incoming administrative operations on the tenant should retry later
    • Anything trying to read the live state of the tenant (e.g. a page service reader) should retry later or block.
  • Store TenantsMap in std::sync::RwLock
  • Provide an extended get_active_tenant_with_timeout for page_service to use, which will wait on InProgress slots as well as non-active tenants.

Closes: #5378

@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from 4c6bd2e to 8307a31 Compare September 26, 2023 17:17
@github-actions
Copy link

github-actions bot commented Sep 26, 2023

2358 tests run: 2241 passed, 0 failed, 117 skipped (full report)


Flaky tests (2)

Postgres 16

  • test_crafted_wal_end[last_wal_record_crossing_segment]: debug

Postgres 15

  • test_statvfs_pressure_min_avail_bytes: release

Code coverage (full report)

  • functions: 54.6% (8878 of 16272 functions)
  • lines: 81.6% (51148 of 62650 lines)

The comment gets automatically updated with the latest test results
b09891e at 2023-11-06T14:16:06.844Z :recycle:

@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from 8307a31 to 99fd0d5 Compare September 29, 2023 14:47
jcsp added a commit that referenced this pull request Oct 5, 2023
…ocations (#5299)

## Problem

These changes are part of building seamless tenant migration, as
described in the RFC:
- #5029

## Summary of changes

- A new configuration type `LocationConf` supersedes `TenantConfOpt` for
storing a tenant's configuration in the pageserver repo dir. It contains
`TenantConfOpt`, as well as a new `mode` attribute that describes what
kind of location this is (secondary, attached, attachment mode etc). It
is written to a file called `config-v1` instead of `config` -- this
prepares us for neatly making any other profound changes to the format
of the file in future. Forward compat for existing pageserver code is
achieved by writing out both old and new style files. Backward compat is
achieved by checking for the old-style file if the new one isn't found.
- The `TenantMap` type changes, to hold `TenantSlot` instead of just
`Tenant`. The `Tenant` type continues to be used for attached tenants
only. Tenants in other states (such as secondaries) are represented by a
different variant of `TenantSlot`.
- Where `Tenant` & `Timeline` used to hold an Arc<Mutex<TenantConfOpt>>,
they now hold a reference to a AttachedTenantConf, which includes the
extra information from LocationConf. This enables them to know the
current attachment mode.
- The attachment mode is used as an advisory input to decide whether to
do compaction and GC (AttachedStale is meant to avoid doing uploads,
AttachedMulti is meant to avoid doing deletions).
- A new HTTP API is added at `PUT /tenants/<tenant_id>/location_config`
to drive new location configuration. This provides a superset of the
functionality of attach/detach/load/ignore:
  - Attaching a tenant is just configuring it in an attached state
  - Detaching a tenant is configuring it to a detached state
  - Loading a tenant is just the same as attaching it
- Ignoring a tenant is the same as configuring it into Secondary with
warm=false (i.e. retain the files on disk but do nothing else).

Caveats:
- AttachedMulti tenants don't do compaction in this PR, but they do in
the follow on #5397
- Concurrent updates to the `location_config` API are not handled
elegantly in this PR, a better mechanism is added in the follow on
#5367
- Secondary mode is just a placeholder in this PR: the code to upload
heatmaps and do downloads on secondary locations will be added in a
later PR (but that shouldn't change any external interfaces)

Closes: #5379

---------

Co-authored-by: Christian Schwarz <christian@neon.tech>
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch 3 times, most recently from f995935 to c0f5248 Compare October 11, 2023 11:57
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from c0f5248 to 071e06b Compare October 19, 2023 15:49
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch 3 times, most recently from a2dbce5 to d908b55 Compare October 31, 2023 09:46
@jcsp jcsp self-assigned this Oct 31, 2023
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from d908b55 to 4f29dfb Compare October 31, 2023 14:49
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from 4f29dfb to a2e7a79 Compare November 1, 2023 14:23
@jcsp jcsp force-pushed the jcsp/secondary-locations-pt2 branch from a2e7a79 to 19467ad Compare November 2, 2023 11:31
@jcsp jcsp added t/feature Issue type: feature, for new features or requests c/storage/pageserver Component: storage: pageserver labels Nov 2, 2023
@jcsp jcsp marked this pull request as ready for review November 2, 2023 12:07
@jcsp jcsp requested a review from a team as a code owner November 2, 2023 12:07
@jcsp jcsp requested review from koivunej and removed request for a team November 2, 2023 12:07
@jcsp jcsp removed the request for review from koivunej November 3, 2023 15:38
@jcsp jcsp dismissed problame’s stale review November 6, 2023 10:05

Christian is out of office today

@jcsp jcsp enabled auto-merge (squash) November 6, 2023 12:41
@jcsp jcsp disabled auto-merge November 6, 2023 12:43
pageserver/src/http/routes.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/mgr.rs Outdated Show resolved Hide resolved
@jcsp jcsp requested a review from arpad-m November 6, 2023 12:56
@jcsp
Copy link
Collaborator Author

jcsp commented Nov 6, 2023

Test failure looks like a test helper issue:

@jcsp jcsp merged commit 85cd97a into main Nov 6, 2023
35 checks passed
@jcsp jcsp deleted the jcsp/secondary-locations-pt2 branch November 6, 2023 14:03
Comment on lines +1433 to +1435
unsafe impl Send for SlotGuard {}
unsafe impl Sync for SlotGuard {}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No one questioned this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for pointing this out, cross linking #5802 which addresses this.

jcsp added a commit that referenced this pull request Nov 8, 2023
## Problem

#5711 and #5367 raced -- the `SlotGuard` type needs `Gate` to properly
enforce its invariant that we may not drop an `Arc<Tenant>` from a slot.

## Summary of changes

Replace the TODO with the intended check of Gate.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/feature Issue type: feature, for new features or requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

pageserver: make TenantsMap lock synchronous, add explicit "busy" state to protect operations
5 participants