Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Branch creation is possibly not persisted #809

Closed
yeputons opened this issue Oct 28, 2021 · 13 comments
Closed

Branch creation is possibly not persisted #809

yeputons opened this issue Oct 28, 2021 · 13 comments
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Milestone

Comments

@yeputons
Copy link
Contributor

The initial discussion is here: #746 (comment)

I was able to reproduce the issue as well. Now I'm wondering what happens if the Compute Node goes down immediately after reporing successful branch creation back to the user: will Safekeepers preserve all necessary information about the branch or it will be lost?

Creating a branch does not create a new transaction, so it probably does not affect WAL, hence there is no immediate need to wait for WAL confirmation. We do not lose WAL, but losing branch information may be not great as well.

@yeputons yeputons mentioned this issue Oct 28, 2021
4 tasks
@kelvich
Copy link
Contributor

kelvich commented Nov 1, 2021

cc @arssher @petuhovskiy

Branch creation happens in a following manner:

  1. branch_create api call on the pageserver initializes new timeline
  2. new compute node is started with that timeline_id
  3. compute node generates some wal records which are sent to safekeepers

so from the safekeeper's point of view new branch == new node, so it is okay to don't know about a new timeline until first wal record arrives. Is anything wrong with this approach?

@arssher
Copy link
Contributor

arssher commented Nov 1, 2021

Yes, it ought to be okay as long as no one requests parent branch WAL from safekeepers, which I assume to be the case.

"branch information" appears on safekeeper only after first START_WAL_PUSH, and LSNs (flush, commit) are 0 until the first record (or real commit_lsn) arrives. If anything is confused by these 0, we can transfer them to timeline creation, but is it?

@kelvich kelvich added the c/storage/safekeeper Component: storage: safekeeper label Nov 1, 2021
@yeputons yeputons self-assigned this Nov 2, 2021
@yeputons yeputons changed the title Branch creation is possibly not persisted to safekeepers Branch creation is possibly not persisted Nov 2, 2021
@yeputons
Copy link
Contributor Author

yeputons commented Nov 2, 2021

It seems to me that information about branches is stored on pageserver only, isn't it?

If so, the following failure scenario looks possible:

  1. A user creates a branch newbr. Pageserver creates all necessary files locally, but does not fsync them (it currently uses plain std::fs::write at least somewhere).
  2. (optional) The user makes some small transactions.
  3. The compute node persists data to safekeepers and reports the transaction as successful.
  4. Page server goes down.
  5. Information about newbr is completely lost, despite there are some WALs in Safekeepers.
  6. Page server goes back up, but now the user cannot query against newbr as it's lost.

Severity depends on whether we recommend accessing branches by their names or timeline ids:

  • In the former case it will be a simple failure like "branch not found" (and maybe some garbage stored on Safekeepers, possibly either eating up into user's quota or not eating up; not sure what's worse).
  • In the latter case there may be some more sophisticated failure as Safekeepers try to push (or callmemaybe) WAL to a Pageserver and it does not know that such a branch even exists.

@yeputons yeputons added c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug labels Nov 2, 2021
@arssher
Copy link
Contributor

arssher commented Nov 2, 2021

Indeed, pageserver should persist branch creation, likely even to s3.

@stepashka stepashka added s/wontfix and removed c/storage/safekeeper Component: storage: safekeeper s/wontfix labels Dec 23, 2021
@neondatabase-bot neondatabase-bot bot added this to the Technical preview milestone Feb 10, 2022
@stepashka
Copy link
Member

cc @kelvich

@stepashka
Copy link
Member

@arssher, taking into account the new API for tenant creation, is this fixed by any chance?

@arssher
Copy link
Contributor

arssher commented Mar 10, 2022

Timeline creation API on safekeepers is not used yet.

Seems like the only thing to "fix" here is persisting branch creation on pageserver in fault tolerant way. It is minor and not urgent.

@LizardWizzard
Copy link
Contributor

I think the primary source of truth regarding branches is console. Console creates branches on the pageserver, retries the corresponding api call in case of possible failures and so on.

Also after #1286 branch name mapping is removed from pageserver.
Then there are two cases for branch creation:

  • Creation of the root branch. It involves calling to initdb and it generates a bunch of files. This operation is not transactional, so if it is interrupted in the middle the state is unpredictable. In that case we can only delete this branch and create a new one.
  • Branching off the existing branch. Requires only metadata file creation, uses fsync + checksum. If metadata write is interrupted we can detect it and recreate timeline if needed.

@LizardWizzard
Copy link
Contributor

Indeed, pageserver should persist branch creation, likely even to s3.

When new root branch is created pageserver performs a checkpoint which is uploaded to s3. Child branch will appear on s3 on first checkpoint.

@yeputons
Copy link
Contributor Author

TODO for me: is this still actual? I suspect it still is: there is still an explicit timeline creation API in Pageserver.

@LizardWizzard
Copy link
Contributor

FYI pageserver now uses more complicated init flow with uninit mark file for root timelines and for branches it is just a metadata file which is 512 bytes (one sector) and we run fsync on it and its parent directory. So should be fine as long as we dont return response earlier than the data is written

@shanyp
Copy link
Contributor

shanyp commented Dec 18, 2022

@arssher is this still relevant? if not I could close https://github.com/neondatabase/cloud/issues/854

@neondatabase-bot neondatabase-bot bot added this to the 2022/12 milestone Dec 19, 2022
@arssher
Copy link
Contributor

arssher commented Dec 19, 2022

Most likely not (see last comment), let's close it.

@arssher arssher closed this as completed Dec 19, 2022
@yeputons yeputons removed their assignment Dec 20, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
c/storage/pageserver Component: storage: pageserver t/bug Issue Type: Bug
Projects
None yet
Development

No branches or pull requests

6 participants