-
Notifications
You must be signed in to change notification settings - Fork 492
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Branch creation is possibly not persisted #809
Comments
Branch creation happens in a following manner:
so from the safekeeper's point of view new branch == new node, so it is okay to don't know about a new timeline until first wal record arrives. Is anything wrong with this approach? |
Yes, it ought to be okay as long as no one requests parent branch WAL from safekeepers, which I assume to be the case. "branch information" appears on safekeeper only after first START_WAL_PUSH, and LSNs (flush, commit) are 0 until the first record (or real commit_lsn) arrives. If anything is confused by these 0, we can transfer them to timeline creation, but is it? |
It seems to me that information about branches is stored on pageserver only, isn't it? If so, the following failure scenario looks possible:
Severity depends on whether we recommend accessing branches by their names or timeline ids:
|
Indeed, pageserver should persist branch creation, likely even to s3. |
cc @kelvich |
@arssher, taking into account the new API for tenant creation, is this fixed by any chance? |
Timeline creation API on safekeepers is not used yet. Seems like the only thing to "fix" here is persisting branch creation on pageserver in fault tolerant way. It is minor and not urgent. |
I think the primary source of truth regarding branches is console. Console creates branches on the pageserver, retries the corresponding api call in case of possible failures and so on. Also after #1286 branch name mapping is removed from pageserver.
|
When new root branch is created pageserver performs a checkpoint which is uploaded to s3. Child branch will appear on s3 on first checkpoint. |
TODO for me: is this still actual? I suspect it still is: there is still an explicit timeline creation API in Pageserver. |
FYI pageserver now uses more complicated init flow with uninit mark file for root timelines and for branches it is just a metadata file which is 512 bytes (one sector) and we run fsync on it and its parent directory. So should be fine as long as we dont return response earlier than the data is written |
@arssher is this still relevant? if not I could close https://github.com/neondatabase/cloud/issues/854 |
Most likely not (see last comment), let's close it. |
The initial discussion is here: #746 (comment)
I was able to reproduce the issue as well. Now I'm wondering what happens if the Compute Node goes down immediately after reporing successful branch creation back to the user: will Safekeepers preserve all necessary information about the branch or it will be lost?
Creating a branch does not create a new transaction, so it probably does not affect WAL, hence there is no immediate need to wait for WAL confirmation. We do not lose WAL, but losing branch information may be not great as well.
The text was updated successfully, but these errors were encountered: