Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement backup/restore for TSM. #5224

Merged
merged 5 commits into from
Dec 31, 2015
Merged

Implement backup/restore for TSM. #5224

merged 5 commits into from
Dec 31, 2015

Conversation

pauldix
Copy link
Member

@pauldix pauldix commented Dec 25, 2015

This present is just in time to get under the tree for Christmas!

This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.

The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the since flag, which will only backup TSM files that have been created since that timestamp.

The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.

For details on usage use influxd backup

@pauldix
Copy link
Member Author

pauldix commented Dec 25, 2015

}

// backupDatabase will request the database information from the server and then backup the metasore and
// every shard in every retention policy in the database. Each shard will be written to a separate tar.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Freudian slip: s/metasore/metastore/

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, got that right!

@benbjohnson
Copy link
Contributor

🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄

+1, I looked over it and it seems reasonable.

🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄

I think some additional testing for error conditions would be nice but overall lgtm.

🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄 🎄

@pauldix
Copy link
Member Author

pauldix commented Dec 25, 2015

@benbjohnson updated to address your comments. I agree it would be good to add some more test cases to the backup_restore_test.go but I wanted to get it up quickly :)

@toddboom
Copy link
Contributor

ZOMG 👏

// that new TSM files will not be able to be created in this shard while the
// backup is running. For shards that are still acively getting writes, this
// could cause the WAL to backup, increasing memory usage and evenutally rejecting writes.
func (e *DevEngine) Backup(w io.Writer, basePath string, since time.Time) error {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the idea for the Backup function to be a config option, or something that an admin kicks off as necessary? Or will this be an automatic function?

If the latter, perhaps this can be restricted only to shards that are NOT the current shard (ie. duration is passed, kick off backup, continue writing to new shard)? If a single-shard situation, then this should be run only on the very oldest set of tsm files that should not be getting compacted regularly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backup command is executed through the command line via influxd backup <args>. It's meant to be called by administrators and operators that are backing up their data. They can backup an entire database, only a specific retention policy, or only a specific shard.

Under the covers when backing up either a DB or RP, each shard is backed up serially. This reduces the window of time in which any given shard is locked from creating new TSM files (and thus unable to flush the WAL or perform compactions).

For operators that want to only backup shards that are cold for writes, it would be trivial to write a script that only attempts to backup shards with a modified date older than some threshold.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, makes sense to run this on demand as opposed to running during primetime hours.

@kfitzpatrick
Copy link
Contributor

Is there a man page with the arguments yet?

@pauldix
Copy link
Member Author

pauldix commented Dec 28, 2015

@kfitzpatrick not yet, but if you build this and run influxd backup you'll see the usage instructions.

Metafile = "meta"

// BackupFilePattern is the beginning of the pattern for a backup
// file. They follow the scheme <database>.<retention>.<shardID>.<inrement>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inrement -> increment

@otoolep
Copy link
Contributor

otoolep commented Dec 29, 2015

Seems pretty straightforward in principle, hopefully the WAL will work sufficiently to buffer the data.

+1 on green.

This changes backup and restore to work for TSM. It breaks it for b1 and bz1, but since those are getting removed it's ok.

The backup runs against any host that is specified and can backup either the metasstore, a database, specific retention policy, or a specific shard. It can also take incremental backups with the `since` flag, which will only backup TSM files that have been created since that timestamp.

The backup is safe to run online. However, for shards that are still hot for writes, they won't be able to create new TSM files while the backup for that single shard runs. If the backup isn't too large and the write throughput isn't too high this shouldn't be a problem since the writes will just go into the WAL cache.
pauldix added a commit that referenced this pull request Dec 31, 2015
Implement backup/restore for TSM.
@pauldix pauldix merged commit ee233c8 into master Dec 31, 2015
@pauldix pauldix deleted the pd-backup-restore branch December 31, 2015 13:56
@jwilder jwilder added this to the 0.10.0 milestone Feb 1, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants