-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement backup/restore for TSM. #5224
Conversation
a8bd5dc
to
128b8eb
Compare
} | ||
|
||
// backupDatabase will request the database information from the server and then backup the metasore and | ||
// every shard in every retention policy in the database. Each shard will be written to a separate tar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Freudian slip: s/metasore/metastore/
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lol, got that right!
🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 +1, I looked over it and it seems reasonable. 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 I think some additional testing for error conditions would be nice but overall lgtm. 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 |
@benbjohnson updated to address your comments. I agree it would be good to add some more test cases to the |
ZOMG 👏 |
// that new TSM files will not be able to be created in this shard while the | ||
// backup is running. For shards that are still acively getting writes, this | ||
// could cause the WAL to backup, increasing memory usage and evenutally rejecting writes. | ||
func (e *DevEngine) Backup(w io.Writer, basePath string, since time.Time) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the idea for the Backup function to be a config option, or something that an admin kicks off as necessary? Or will this be an automatic function?
If the latter, perhaps this can be restricted only to shards that are NOT the current shard (ie. duration is passed, kick off backup, continue writing to new shard)? If a single-shard situation, then this should be run only on the very oldest set of tsm files that should not be getting compacted regularly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The backup command is executed through the command line via influxd backup <args>
. It's meant to be called by administrators and operators that are backing up their data. They can backup an entire database, only a specific retention policy, or only a specific shard.
Under the covers when backing up either a DB or RP, each shard is backed up serially. This reduces the window of time in which any given shard is locked from creating new TSM files (and thus unable to flush the WAL or perform compactions).
For operators that want to only backup shards that are cold for writes, it would be trivial to write a script that only attempts to backup shards with a modified date older than some threshold.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ack, makes sense to run this on demand as opposed to running during primetime hours.
Is there a man page with the arguments yet? |
@kfitzpatrick not yet, but if you build this and run |
Metafile = "meta" | ||
|
||
// BackupFilePattern is the beginning of the pattern for a backup | ||
// file. They follow the scheme <database>.<retention>.<shardID>.<inrement> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inrement -> increment
Seems pretty straightforward in principle, hopefully the WAL will work sufficiently to buffer the data. +1 on green. |
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok. The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp. The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
7699608
to
9cede5f
Compare
Implement backup/restore for TSM.
This present is just in time to get under the tree for Christmas!
This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.
The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the
since
flag, which will only backup TSM files that have been created since that timestamp.The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
For details on usage use
influxd backup