-
Notifications
You must be signed in to change notification settings - Fork 912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
doc/BACKUP.md: Document backup strategies for lightningd
.
#4207
Conversation
Clarified location of |
Corrected misspelling of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wonderful documentation (except the BTRFS ad :-P)
Thank you!
doc/BACKUP.md
Outdated
|
||
* Attempt to recover using the other backup options below first. | ||
Any one of them will be better than this backup option. | ||
* Recovering by this method ***MUST*** always be the ***last*** resort. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to something like "Use this method ONLY as a last resort" (goes well with "as long as you:" above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Went with "Recover by this method ONLY as a last resort"
doc/BACKUP.md
Outdated
BTRFS would probably work better if you were purchasing an entire set | ||
of new storage devices to set up a new node. | ||
|
||
On BSD you can use a ZFS RAID-Z setup, which is probably better than BTRFS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general advice (like first paragraph in this section also) could be moved together, BTRFS could be turned into its own sub-section so that it is optically skippable (sp?).
BTW this review comes from a point of view of a non-native speaker. So take it with a grain of salt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. I guess we can list software methods of RAID-1, such as:
mdadm
- BTRFS
- ZFS
Updated as per @jsarenik feedback. |
I think you're referring to @gabridome's excellent tutorial here: https://github.com/gabridome/docs/blob/master/c-lightning_with_postgresql_reliability.md |
Add |
Would be worth to link it from https://lightning.readthedocs.io/FAQ.html#how-to-backup-my-wallet i think |
Mention the SQLITE3 backup API, add link from FAQ.md to BACKUP.md |
Mention using the |
doc/BACKUP.md
Outdated
|
||
This creates a consistent snapshot of the database, sampled in a | ||
transaction, that is assured to be openable later by `sqlite3`. | ||
The operation of the `lightningd` process will be paused while `sqlite3` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
THIS IS FACTUALLY INCORRECT.
I did a stress test where a program continuously makes many small transactions to update a database. When you do something like .backup 'backup.sqlite3'
or VACUUM INTO 'backup.sqlite3';
in a separate sqlite3
, then either the main program gets an SQLITE_BUSY
, or the separate sqlite3
returns a "Database is locked" error.
I suspect that an SQLITE_BUSY
will cause lightningd
to crash, so this is probably strongly not recommended to use a separate process to back up.
Doing this with my running lightningd
does not cause this problem, but probably only because my stress-test program keeps doing queries continuously, whereas a good amount of time lightningd
just sits there waiting for something interesting to happen. But race conditions can exist so we should not recommend this in our backup strategy document!
We can do:
- Expose an
sqlite3_backup
command which performs theVACUUM INTO
query insidelightningd
. - Call into
sqlite3_busy_timeout
with a "reasonable" large value, say 5000 (5 seconds). (Equivalently, callPRAGMA busy_timeout = 5000;
query at the same time we doPRAGMA foreign_keys=on;
). Then, even if the.backup
orVACUUM INTO
takes up to 5 seconds to back up,lightningd
will just wait.
The latter solution is easier but there is always the possibility that the backup process will take more than whatever timeout we select. The earlier solution is more reliable and there is no timeout, but note that if say the target location is slow (e.g. an NFS mount) then lightningd
will suspend indefinitely.
Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @ZmnSCPxj for ideas and investigating possible issues with running backup in separate sqlite3
process!
Exposing the sqlite3_backup
command would be great! It sounds like a very nice and clear way to do backup.
Remove the factually incorrect text mentioned in #4207 (comment) |
Teach how to quickly check if a backup database is corrupted. |
doc/BACKUP.md
Outdated
lose all funds in open channels. | ||
|
||
However, again, note that a "no backups #reckless" strategy leads to | ||
*definite* loss of funds, so you should still prefer *this* strategy rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, what if the other user close the channels? wouldn't you get the money back to your wallet?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from the text, you can recover your funds only if your peer
a) Use the option_dataloss_protect
correctly (not pretending to do it, just to grab your funds). Otherwise, you simply doesn't own the private keys of the address where the funds are sent anymore. Also "If the peer does not support this option, then the entire channel funds will be revoked by the peer."
b) If/when he decide to force close the channel with you.
That seem clear to me so I wouldn't change it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add a bit more complexity: if we use option_upfront_shutdown_script
we may actually get the funds back to our wallet if the peer closes, because that removes the tweak to the their_unilateral/to_us transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good info
Okay, we can actually implement an sqlite3-backup-snapshot in a plugin, I think. Basically,
If so, a plugin that hooks into So it looks to me that we can have a C plugin which hooks into In WAL mode we need to copy the |
Not actually, because the |
This actually made sense to me. There could be a configurable timeout period in the conf file to denote the period of snapshots. At each interval writing would be paused for the db so that corrupted snapshots would be avoided. There would be a standardized snapshot folder in the users .lighting folder which would be the most all-encompassing solutions of all the implementations. |
All that In particular, such a regular snapshotting would be vastly inferior to
Regular snapshotting of the db say once or twice a day would be a good "backup of a backup" --- your primary backup should still be a |
Available for any question, comment or help. |
@gabridome does the PostgreSQL section I made make sense? |
doc/BACKUP.md
Outdated
lose all funds in open channels. | ||
|
||
However, again, note that a "no backups #reckless" strategy leads to | ||
*definite* loss of funds, so you should still prefer *this* strategy rather |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand from the text, you can recover your funds only if your peer
a) Use the option_dataloss_protect
correctly (not pretending to do it, just to grab your funds). Otherwise, you simply doesn't own the private keys of the address where the funds are sent anymore. Also "If the peer does not support this option, then the entire channel funds will be revoked by the peer."
b) If/when he decide to force close the channel with you.
That seem clear to me so I wouldn't change it.
following command gives you a path: | ||
|
||
pg_config --includedir | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't know it. When I built it, everything worked out of the box. Pretty useful to know.
I probably had already the library installed.
I would add this part also to a specific document about lightning with Postgres (which I haven't found).
I will add this to my guide also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, maybe the Lightning+PostgreSQL can be a separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The document you mean?
Something like doc/postgresQL.md
?
Anyway This part in the backup strategies seems pretty right to me. At least to mention a more extended part in the doc directory. @fiatjaf Has thought about that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the document I mean.
You should use the same PostgreSQL version of `libpq-dev` as what you run | ||
on your cluster, which probably means running the same distribution on | ||
your cluster. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personally I have added on the cluster machines the Postgres .deb repository in this way I can keep the versions aligned.
I didn't encounter any problem but maybe it is not wise to exit from the Postgres version installed with the Debian distribution. What you guys think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have used the same distro throughout just to reduce inter-version problems, which is why I suggested the above. Mixing deb repositories has gotten me into trouble before when I wanted to upgrade my OS (but upgrading an OS is always fraught, sometimes it is best to just have two small OS partitions and alternate installing OS's between them rather than upgrading an existing install...)
doc/BACKUP.md
Outdated
(though you should probably do some more double-checking and tire-kicking | ||
in the "Connect to the database" stage you resume at, such as checking if | ||
`listpeers` still lists the same channels as you had, and so on). | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wise suggestion IMO.
Debian Testing ("bullseye") uses PostgreSQL 13.0 as of this writing. | ||
PostgreSQL 12 had a non-trivial change in the way the restore operation is | ||
done for replication. | ||
You should use the same PostgreSQL version of `libpq-dev` as what you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I encountered the problem myself. I would add to always read the official synchronous replication PostgresQL guide for the specific version you are using.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, I will add that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there is little point in replicating the official doc here, it may just end up with stale information, requiring updates from us. If there is an authoritative source, link to it.
[guide by @gabridome][gabridomeguide]. | ||
|
||
[gabridomeguide]: https://github.com/gabridome/docs/blob/master/c-lightning_with_postgresql_reliability.md | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The guide is not maintained as the versions of Postgres evolve. Please always check also the Postgres Guide about synchronous replication for your specific version.
Absolutely. Thank you for mentioning my guide. |
Rebased, emphasized checking your PostgreSQL version, explicated more about tire-kicking after SQLITE3->PG conversion. |
No ACKs? People seem to want this. Is there any issue or quibble that prevents this from being merged? |
FWIW conceptACK. I plaude loudly this incredible work on the most important topic of the LN that has been delayed for too long. I frankly hope that these strategies will become more and more user friendly as the time passes. Until then, the paradox is that the ones less aware of the problem, are the one more in need of a good solution... |
Hi, |
@gabridome how do we recover the |
Mmh... You're right. Gotta check.
…On Thu, Dec 10, 2020, 11:17 PM ZmnSCPxj, ZmnSCPxj jxPCSmnZ < ***@***.***> wrote:
@gabridome <https://github.com/gabridome> how do we recover the hsm_secret
from the xprv? The program seems to only convert from hsm_secret to xprv.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#4207 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AADQQ3VOPM5CKDMEOSXWTCLSUFCGNANCNFSM4TYRZNBQ>
.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay reviewing this. It looks quite good, but reordering might be good, to emphasize which mechanisms are preferred, and for whom it might be worth it:
- Backup
hsm_secret
: static, for all users - Backup plugins for end-users
- Postgresql replication for enterprises
- File-backup: if no other option, and with appropriate warnings
- Backup while offline
- Hot backups while the node is running
doc/BACKUP.md
Outdated
|
||
But in Lightning, since *you* are the only one storing all your | ||
financial information, you ***cannot*** recover this financial | ||
information anywhere else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
information anywhere else. | |
information from anywhere else. |
doc/BACKUP.md
Outdated
This creates an initial copy of the database at the NFS mount. | ||
* Add these settings to your `lightningd` configuration: | ||
* `important-plugin=/path/to/backup.py` | ||
* `backup-destination=file:///path/to/nfs/mount` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed, because we now write a lock
file into the $LIGHTNINGDIR
in order to be functional right-away, and avoid accidentally changing the backup location.
Debian Testing ("bullseye") uses PostgreSQL 13.0 as of this writing. | ||
PostgreSQL 12 had a non-trivial change in the way the restore operation is | ||
done for replication. | ||
You should use the same PostgreSQL version of `libpq-dev` as what you run |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, there is little point in replicating the official doc here, it may just end up with stale information, requiring updates from us. If there is an authoritative source, link to it.
Seeing the discussion on how to backup the sqlite file, would it not be best to not filesystem-copy it, but sqlite3 .dump it? https://www.sqlitetutorial.net/sqlite-dump/ or would this not guarantee consistency? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really nice! I now have a reference to point people to instead of handwaving the different possibilities each time i'm asked :)
Just a small comment re the db migration tool which i'm not sure is safe.
FWIW, it was already part of |
It locks the file while doing the dump. The |
ChangeLog-Added: Document: `doc/BACKUP.md` describes how to back up your C-lightning node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ACK 948489a
Thanks for taking the time to write this up.
Even if you have one of the better options above, you might still want to do | ||
this as a worst-case fallback, as long as you: | ||
|
||
* Attempt to recover using the other backup options below first. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s/below/above/
LOL I did not fix it up after rearranging the text!
Hi @darosior , I think the last release allows to dump public version of the descriptors. But I needed xpriv because I had a corrupt db and wanted to scan for coins and recover them |
@domegabri oh right we removed it in later stage as it was a nice footgun! #4171 (comment) |
Requested by @d4amenace here: #4200 (comment)
@cdecker to finish up the document re: PostgreSQL. I remember somebody made a medium article on C-Lightning PostgreSQL but I did not save the link...
Changelog-None