Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Timeline offloading persistence #9444

Merged
merged 33 commits into from
Oct 22, 2024
Merged

Timeline offloading persistence #9444

merged 33 commits into from
Oct 22, 2024

Conversation

arpad-m
Copy link
Member

@arpad-m arpad-m commented Oct 17, 2024

Persist timeline offloaded state to S3.

Right now, as of #8907, at each restart of the pageserver, all offloaded state is lost, so we load the full timeline again. As it starts with an empty local directory, we might potentially download some files again, leading to downloads that are ultimately wasteful.

This patch adds support for persisting the offloaded state, allowing us to never load offloaded timelines in the first place. The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines. It is updated each time we offload or unoffload a timeline, and otherwise never touched.

This choice means that tenants where no offloading is happening will not immediately get a manifest, keeping the change very minimal at the start.

We leave generation support for future work. It is important to support generations, as in the worst case, the manifest might be overwritten by an older generation after a timeline has been unoffloaded (and unarchived), so the next pageserver process instantiation might wrongly believe that some timeline is still offloaded even though it should be active.

Part of #9386, #8088

Copy link

github-actions bot commented Oct 17, 2024

5316 tests run: 5102 passed, 0 failed, 214 skipped (full report)


Flaky tests (4)

Postgres 17

Code coverage* (full report)

  • functions: 31.4% (7676 of 24440 functions)
  • lines: 48.9% (60320 of 123406 lines)

* collected from Rust tests only


The comment gets automatically updated with the latest test results
5ad1198 at 2024-10-22T21:02:42.725Z :recycle:

@arpad-m arpad-m force-pushed the arpad/offload_persistence branch from b630c9c to b9d33fb Compare October 17, 2024 22:50
The test has some smaller modifications from test_timeline_offloading:

* the method by which we determine the offloaded state
* us doing manual offloading all the time
* only one offloaded branch
* a test in the end that deletion of a tenant works (so timeline deletion
  deletes the manifest as well)
@arpad-m arpad-m marked this pull request as ready for review October 19, 2024 00:38
@arpad-m arpad-m requested a review from a team as a code owner October 19, 2024 00:38
@arpad-m arpad-m requested a review from erikgrinaker October 19, 2024 00:38
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unclear to me how we deal with interrupted offload operations. Or more broadly, tenant manifest conflicts, given that we duplicate the tenant manifest across shards.

The RFC says that shard 0 should be authoritative -- is it worth giving it special treatment, such that we ensure shard 0's manifest is uploaded before the other shards? Do we rely on the offloaded state being eventually consistent, and a failed actor being required to retry until that's the case? I'm not familiar enough with the design to enumerate all of the possible hazards here, but you get my drift.

It would be worth spelling all of this out in the manifest's doc comment, along with safety arguments and appropriate test cases.

pageserver/src/tenant/timeline/offload.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/remote_timeline_client.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/remote_timeline_client/manifest.rs Outdated Show resolved Hide resolved
pageserver/src/tenant/remote_timeline_client/manifest.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Show resolved Hide resolved
pageserver/src/tenant/timeline/offload.rs Show resolved Hide resolved
pageserver/src/tenant/remote_timeline_client.rs Outdated Show resolved Hide resolved
pageserver/src/tenant.rs Outdated Show resolved Hide resolved
@arpad-m
Copy link
Member Author

arpad-m commented Oct 21, 2024

It's unclear to me how we deal with interrupted offload operations. Or more broadly, tenant manifest conflicts, given that we duplicate the tenant manifest across shards.

There is no cross-shard offload operation: each shard can decide on the offload on its own, which it does during compaction. So that's why one shouldn't have a unified tenant manifest either. I don't think it makes much sense to have only some shards offloaded as an intentional state, there is nothing gained by that. But it's also not an erroneous condition, and most features of the pageserver are per-shard, and not per-tenant.

We'd otherwise need to make offloading be coordinated by the storage controller, something we gain little from.

However, this made me think about shard splits. We'll probably need to copy over the manifests, or write them from scratch (and what do do with the pre-split manifests? probably we can delete them).

@erikgrinaker
Copy link
Contributor

erikgrinaker commented Oct 21, 2024

There is no cross-shard offload operation: each shard can decide on the offload on its own, which it does during compaction. So that's why one shouldn't have a unified tenant manifest either. I don't think it makes much sense to have only some shards offloaded as an intentional state, there is nothing gained by that. But it's also not an erroneous condition, and most features of the pageserver are per-shard, and not per-tenant.

I see. Making offloading a shard-local operation seems reasonable. I was confused by the below from the PR description, as well as the RFC's reference to an authoritative manifest:

The persistence feature is facilitated via a new file in S3 that is tenant-global, which contains a list of all offloaded timelines.

It seems worthwhile to me to explicitly scope the manifest to a shard rather than a tenant, e.g. by referring to it as a tenant shard manifest (or just shard manifest), and point out that it's perfectly valid (if unexpected) for them to diverge across shards. This would prevent confusion and footguns if we later try to use it as tenant-global. But maybe that's just me -- feel free to get others' opinion on this as well.

However, this made me think about shard splits. We'll probably need to copy over the manifests

Yes, good catch, we'll need to propagate these.

@arpad-m arpad-m requested a review from erikgrinaker October 22, 2024 10:57
Copy link
Contributor

@erikgrinaker erikgrinaker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this made me think about shard splits. We'll probably need to copy over the manifests, or write them from scratch

For posterity: this has been added as a follow-up task on #8088.

pageserver/src/tenant/timeline/delete.rs Show resolved Hide resolved
@arpad-m arpad-m enabled auto-merge (squash) October 22, 2024 15:39
@arpad-m arpad-m merged commit 6f8fcdf into main Oct 22, 2024
80 checks passed
@arpad-m arpad-m deleted the arpad/offload_persistence branch October 22, 2024 20:52
arpad-m added a commit that referenced this pull request Oct 22, 2024
We support multiple storage backends now, so remove the `_s3_` from the
name.

Analogous to the names adopted for tenant manifests added in #9444.
arpad-m added a commit that referenced this pull request Oct 25, 2024
Before, we didn't copy over the `index-part.json` of offloaded timelines
to the new shard's location, resulting in the new shard not knowing the
timeline even exists.

In #9444, we copy over the manifest, but we also need to do this for
`index-part.json`.

As the operations to do are mostly the same between offloaded and
non-offloaded timelines, we can iterate over all of them in the same
loop, after the introduction of a `TimelineOrOffloadedArcRef` type to
generalize over the two cases. This is analogous to the deletion code
added in #8907.

The added test also ensures that the sharded archival config endpoint
works, something that has not yet been ensured by tests.

Part of #8088
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants