Skip to content

Easy file backups; each job based on a simple config file.

License

Notifications You must be signed in to change notification settings

natecollins/cfgbackup

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cfgbackup - "cfgbackup's a fairly good backup"

An straightforward file backup script where each job is based around a simple config file. Written in Bash and using rsync for file transfers.

cfgbackup offers the following:

  • Rotational backups, one-directional syncing, or bi-directional mirroring of directories
  • Detailed logging of each job, including built-in log archiving
  • Email notifications on failure
  • Resulting backups are simply files in directories, no special tools for recovery or inspection
  • Can hard link unchanged files between rotational backups
  • Can hard link identical files with a single backup
  • Easy to customize on a per-job basis

Quick Example

First, setup SSH PubKey authentication between backup machine and client.

Create a config file on the target machine (the machine where files will be backed up to):

NOTIFY_EMAIL=admin@example.com
SOURCE_DIR=server1.example.com:/home/ :/etc :/var/www
TARGET_DIR=/backups/server1/
BACKUP_TYPE=rotation
MAX_ROTATIONS=30
ROTATIONALS_HARD_LINK=1
ALLOW_DELETIONS=1
ALLOW_OVERWRITES=1
LOG_DIR=/var/log/cfgbackup/
COMPRESS_LOGS=1

Start the backup on target machine:

# cfgbackup /etc/cfgbackup/myjob.conf run

Check the status of current/last run:

# cfgbackup /etc/cfgbackup/myjob.conf status

======= cfgbackup job status =======
Config:               myjob.conf
Type:                 sync
Status:               idle
Started:              -
Process ID:           -
Last complete time:   2021-04-03 01:29:01

Log file:             /var/log/cfgbackup/myjob_20210403.log
Latest log messages:

  sent 555,214 bytes  received 5,072 bytes  1,120,572.00 bytes/sec
  total size is 531,247  speedup is 0.95
  | JOB ENDED: 2021-04-03 01:29:01

For rotation type backups, you can list what backups are available:

# cfgbackup /etc/cfgbackup/myjob.conf list
Backups:  5 / 30
------------------------------------------------------------
backup-20210403                             2021-04-03 01:29
backup-20210402                             2021-04-02 01:27
backup-20210401                             2021-04-01 01:27
backup-20210331                             2021-03-31 01:28
backup-20210330                             2021-03-30 01:27

That's all there is to getting started! Yet there is a whole lot more functionality and customization available.

Links

Dependencies

  • bash 4.2+
  • rsync (recommended 3.1.0+)
  • awk
  • sed
  • grep
  • coreutils
  • findutils
  • hardlink (optional)

Installation

Installation can be as simple as downloading the cfgbackup script, setting the execute flag, and placing it in a logical place, like /usr/local/bin/. You'll likely want to grab a copy of the example.conf file as well, to use as a template for creating backup jobs.

For Debian based Linux distributions, you can use the build/build-deb script to create a .deb package, then install that with dpkg like any other package. Pre-build packages are available at https://github.com/natecollins/cfgbackup/releases

Basic Usage

There are three basic types of backup jobs cfgbackup can do: rotation, sync, and mirror:

  • rotation jobs create a series of backup rotations in subdirs of the target dir
  • sync jobs syncronize a directory from one location to another
  • mirror updates files in both directions, including deletions (EXPERIMENTAL)

For backups in the truest sense of the word, it is highly recommended to use the rotation job type.

Each backup job has it's own config file. Provided is an sample config called example.conf; recommended is that you copy this file to use as a template, modifying it as you like. The file can reside anywhere, but suggested location for your config files in production is /etc/cfgbackup/.

In this file, you will need to specify a few required config options.
BACKUP_TYPE which can be either rotation, sync, or mirror
SOURCE_DIR which is the directory to be backed up (can be remote over SSH)
TARGET_DIR is where the backup(s) should go; must be a local directory

For rotation jobs, this must also be set:
MAX_ROTATIONS how many rotation subdirs to create (rotation jobs only)

An example config file, let's call it alpha.conf, might be something like this:

BACKUP_TYPE=rotation
SOURCE_DIR=/var/data/
TARGET_DIR=/mnt/backups/alpha/
MAX_ROTATIONS=20

By default, cfgbackup will try to save log files into the /var/log/cfgbackup/ directory. Make sure that directory is either writable to or creatable by whatever user you will be using to run the script as. Or you can change the log directory with the LOG_DIR config option.

The cfgbackup script is run in the format of:

./cfgbackup [config] [command]

The [config] is just the path to the config file you want to use. The [command] can be one of a number of commands, enumerated below.

Commands

Check Command
The check command will parse the config, checking for errors.

./cfgbackup alpha.conf check

If something wrong is detected in the config file, a message will display describing the error.

If no problems are detected, it will respond with Config is OK..

Run Command
The run command will attempt to start a job for the given config file. It will validate the config first, same as the check command, and then verify that there isn't already a job running. If a job is running, or a previously started job did not complete, then an error message will display and the script will exit.

./cfgbackup alpha.conf run

A successfully started job will run in the foreground and will not output anything to the terminal. Ideally, most jobs will be started using the cron daemon, so terminal output is not desired. To see what is happening with a running job, you can inspect the log file, which you can customize with config options. The default log file name will be based on the config filename and the date. With default options, a config file name of alpha.conf, and a date of 2016-12-31 then the log file would be:

/var/log/cfgbackup/alpha_20161231.log

If you run a job multiple time to the same log file, then the log entries will be appended to it.

Status Command
The status command will report the current status of the job in question. It will report the type of job, whether it is running or failed, what the process id, and more. It also reports the last few lines of the most recent log file.

./cfgbackup alpha.conf status

List Command
The list command is for rotation job only. It will list all backup rotation subdirectories and their date.

./cfgbackup alpha.conf list

Reset Command
The reset command will attempt to reset things to a state where you can run a new job. If a job is running, it can attempt to kill the current job. If the previous run had failed, then it will attempt to put things back into place in order to let you start a new job.

./cfgbackup alpha.conf reset

Note: When running reset on a job with DATE based subdirectories, the command will reset the folder to a date just older than the oldest backup directory, not necessarily the date it was before running the job.

Accept Command
The accept command is a counterpart to the to reset command. Rather than putting the incomplete backup as the oldest backup directory (and thus set to be overwritten during the next run), the accept command marks the incomplete backup as successful and sets it as the newest backup. This is useful if the backup was mostly okay, and you don't want to discard it entirely. Other than this difference, the accept command operates the same as the reset command.

./cfgbackup alpha.conf accept

Pause Command
The puase command simply attempts to pause a currently running backup job process. This is done via sending a SIGSTOP signal to the process.

./cfgbackup alpha.conf pause

Resume Command
The resume command will resume a paused backup job process. This is done via sending a SIGCONT signal to the paused process.

./cfgbackup alpha.conf resume

Config Options

SOURCE_DIR [Required]
The directory to create backups from. Can be local or remote via SSH. Can specify multiple directories, whitespace delimited. If specifying multiple remote directories, all directories must be on same host (limitation of rsync).

Source directories WITH a trailing slash will sync the contents of the source directory into the target backup. Source directories WITHOUT a trailing slash will sync the directory itself (along with its contents) into the target backup.

SOURCE_DIR=/home/
SOURCE_DIR=/var/data
SOURCE_DIR=backups@server.example.com:/path/to/files/
SOURCE_DIR=/etc /var/www /home
SOURCE_DIR=server.example.com:/etc :/var/www :/home

TARGET_DIR [Required]
The directory to sync to (sync type), or where to create subdirectory rotations (rotation type). Must be a local directory.

TARGET_DIR=/var/snyc/
TARGET_DIR=/home/backups

BACKUP_TYPE [Required]
The type of backup to make. Value must be either sync or rotaion.
Sync jobs will make the TARGET_DIR exactly match the SOURCE_DIR, unless other options prevent it (see ALLOW_DELETIONS and ALLOW_OVERWRITES).
Rotation jobs will create a new subdirectory to contain each backup; once the maximum number of backups is reached, it will rotate the last backup within the MAX_ROTATIONS list of backups.

BACKUPS_TYPE=sync
BACKUPS_TYPE=rotation

NOTIFY_EMAIL[Deafult value: (blank)]
The email to send failures and notification to. You must have a Mail Transfer Agent installed to handle actual sending of email. If this setting is left blank, then no emails will be sent. Setting this is highly recommended!

NOTIFY_EMAIL=admin@example.com
NOTIFY_EMAIL=root@localhost

LOG_DIR [Default value: /var/log/cfgbackup/]
The directory where log files will be saved.

LOG_DIR=/var/log

LOG_FILENAME [Default value: CONFNAME_DATE.log]
The name of the log file to use for a job. There are three variables you can use in the value:

  • CONFNAME The name of the config file minus the extension. e.g. 'active' for active.conf
  • DATE The date when the job was started. e.g. 20161231
  • TIME The time when the job was started. e.g. 235959
LOG_FILENAME=backup_DATETIME.log
LOG_FILENAME=cfgb_CONFNAME.log

COMPRESS_LOGS [Default value: 1]
The logs generated by cfgbackup are very verbose and can grow quite large. Thankfully, they are also very compressible. By enabling this option, cfgbackup will check for old logs (over 2 days old) that match the LOG_FILENAME pattern for this job and compress them. By default, it will use gzip, but this can be changed with the COMPRESS_PATH options. To enable compression, set this option to 1, all other values will disable compressed logs.

COMPRESS_LOGS=1

MAX_ROTATIONS [Required for 'rotation' jobs, ignored otherwise]
Only applies to rotation value of BACKUP_TYPE, this is the maximum number of rotational backups for cfgbackup to make. Note that you should probably want to set this number to 1 higher than the maxumim usable backups you'll want. As 1 backup might be in transition while the job is running. So to guarantee 14 backups always be available, you'll want to set this to be at least 15.

MAX_ROTATIONS=15

ROTATIONALS_HARD_LINK [Default value: 0]
Only applies to rotation type jobs, if this value is set to 1, then any unchanged files between rotation backups will be hard linked together. Files that are hard linked together point to the same location on disk, so they don't take up extra space. This can significantly reduce the amount of disk space a set of rotational backup occupies if not many files actually change between jobs. It is recommended that you have rsync version 3.1.0 or greater when this is enabled for best performance. If you have an older version of rsync, however, then cfgbackup will perform the hard linking instead. Enabled if set to 1, disabled otherwise.

ROTATIONALS_HARD_LINK=1

IDENTICALS_HARD_LINK [Default value: 0]
When enabled, searches for files with identical content within a single run of a backup job and hard link them together. Files that are hard linked together point to the same location on disk, so they don't take up extra space. This particular option requires the hardlink program is available; if hardlink is not found, setting this option will prevent the job from running. Note that running hardlink runs as a separate process after the rsync process has completed, thus adding extra time to how long a job takes to run. With large backups, this can potentially take a very long time. Enabled if set to 1, but only links files that match file content, owner, permissions, and timestamp. If set to 2, will link files when only the file content matches. Disable on all other values.

IDENTICALS_HARD_LINK=1

ROTATE_SUBDIR [Default value: backup-NUM1]
Only applied to to rotation type jobs, this option sets the name of the subdirectories where the backed up files will be stored. The value must contain one rotation key. Valid rotation keys are:

  • DATE will result in an 8-digit date, such as 20001231; multiple jobs per day will append a .1, .2, etc
  • NUM0 will result in a numeric increment starting from 0
  • NUM1 will result in a numeric increment starting from 1
  • Left padded versions of the above NUM keys, such as NUM01, NUM000, NUM0001
ROTATE_SUBDIR=backup-DATE
ROTATE_SUBDIR=bak_NUM01

ALLOW_DELETIONS [Default value: 1]
With this option set to 1, files may be deleted from the target backup directory if they are missing from the source directory. If set to 0, then no file deletions will happen in the target directory; additionally, the list of files that do not exist in the source directory will be logged and emailed to the value of NOTIFY_EMAIL.
With a value of 1, this adds the --del flag to the rsync command.
With a value of empty string (no value) or any non 0 or 1 value, missing files from the source directory will not be skipped or reported as such, and no flags are added to the rsync command.
Has no effect if the JOB_TYPE is set to mirror.

ALLOW_DELETIONS=0

ALLOW_OVERWRITES [Default value: 1]
With this option set to 1 (or empty string), files may be updated/overwritten in the target backup directory if then differ in the source directory. If set to 0, then no file modifications will happen in the target directory; additionally, the list of files that are different in the source directory will be logged and emailed to the value of NOTIFY_EMAIL.
With a value of 0, this adds the --ignore-existing flag to the rsync command.
Has no effect if the JOB_TYPE is set to mirror.

ALLOW_OVERWRITES=0

MIRROR_CONFLICT_ACTION [Default value: update]
With this option set to update (or empty string), when a file is updated on one end of a mirror job and deleted on the other end of a mirror job, this is the action to choose when updating that file. If the value is update, then the deleted file is replaced with the updated file. If the value is delete, then the updated file is removed, losing any changes it may have had.
Only valid if the JOB_TYPE is set to mirror.

MIRROR_CONFLICT_UPDATE=delete

RSYNC_FLAGS
For any additional custom flag you would like passed directy to rsync. Note this only adds additional flags, however, you can use the --no-OPTION flags to negate implied flags. For example, you can pass the --no-p flag to not have rsync syncronize permissions. See the rsync manual for details: man rsync
Flags always included, even when none are specified: -av --stats

RSYNC_FLAGS=--exclude=.DS_Store --exclude=._*

NOTIFY_RSYNC_FLAGS [Default value: -O]
If either ALLOW_DELETIONS or ALLOW_OVERWRITES is set to 0, a second rsync job is run to find file changes that were skipped by these settings, with the results being logged and sent to the NOTIFY_EMAIL address. You can add additional flags to that second rsync job here. The default value is the -O flag is recommended, which stops reporting of directories with timestamp differences.

NOTIFY_RSYNC_FLAGS=-O

RSYNC_EXIT_CODE_SUCCESS [Default value: 24]
Exit codes from rsync to treat as a successful backup run. These are in addition to 0, which is always a success. This is useful as a means of preventing minor issues from stopping a rotational backup from finishing. If this variable is not set in the config, it defaults to 24. Setting multiple exit codes is allowed as a comma delimited list.

RSYNC_EXIT_CODE_SUCCESS=23,24

PRE_SCRIPT,SUCCESS_SCRIPT,FAILED_SCRIPT,FINAL_SCRIPT
All these script options allow for the setting of a script to run at a specific time during a backup job run. What you enter will be evaluated as a shell command, so placing multiple commands together and using pipes will also work.

  • PRE_SCRIPT This script will be run immediately when the backup job starts (but after config if checked/parsed), before any other run actions.
  • SUCCESS_SCRIPT Runs this script after the backup job has completed if rsync returns an exit code of 0; also waits until after hardlinks are created if IDENTICALS_HARD_LINK is set to 1.
  • FAILED_SCRIPT Runs this script immediate after rsync if rsync returns an exit code other than 0.
  • FINAL_SCRIPT This script runs as the last thing before the cfgbackup run job ends, regardless of success or failure of rsync. Any scripts specified will cause the backup job to send a failure email if they return a non 0 exit code.
PRE_SCRIPT=/usr/local/bin/app-cache --clear
SUCCESS_SCRIPT=service myapp restart; service apache2 restart
FAILED_SCRIPT=/usr/local/bin/dump-app-state
FINAL_SCRIPT=~adminguy/gen-server-report

PRE_SCRIPT_ERROR_EXIT [Default value: 0]
If set to 1, this will require the PRE_SCRIPT to have an exit code of 0, otherwise the backup job will send a failure notification and then immediately exit.

PRE_SCRIPT_ERROR_EXIT=1

RUNNING_DIRNAME [Default value: backup-running]
Only applied to to rotation type jobs, this option sets the name of the subdirectory used when running an active backup job. This should be unique and never conflict with the directory names generated by ROTATE_SUBDIR.

RUNNING_DIRNAME=backup_in_progress

PID_FILE [Default value: .cfgbackup.pid]
This is a file that is created in the TARGET_DIR whenever a backup job is run. Once the job completes, the file is deleted. The file contains the process id of the main cfgbackup process. For jobs of type sync this file will be ignored as cfgbackup will automatically add the rsync flag --exclude=/PID_FILE to the job.

PID_FILE=.cfgbackup.pid

RSYNC_PATH,COMPRESS_PATH,HARDLINK_PATH,MAIL_PATH,SORT_PATH
The path options allow you to override the binaries for various programs used by cfgbackup. For COMPRESS_PATH you can change the type of compression used by switching the binary.

RSYNC_PATH=/usr/local/bin/rsync
COMPRESS_PATH=bzip2
HARDLINK_PATH=/usr/local/bin/hardlink
MAIL_PATH=/usr/local/bin/mailx
SORT_PATH=/usr/local/bin/gsort

Automating Backups via Cron

The recommended way of automating backup jobs is via cron. For those unfamiliar with cron, it provides a way to schedule commands to run at regular intervals. Cron tasks can be entered in a system-wide file, /etc/crontab, or on a per-user basis with the crontab command. The word "crontab" is short for "cron table", which means a text file with tablular contents to list commands to run at certain times. Using the system-wide crontab is a good idea for backup jobs on multi-user systems, like servers, so that other users are aware of the backup job without having to check other peoples personal crontab.

System Crontab (Recommended)
Example crontab entries for /etc/crontab:

# m  h  dom mon dow  user       command
 30  9  *   *   *    gerald     /usr/local/scripts/report_for_gerald.sh

# webspace hourly backups; runs every hour at 5 minutes into the hour
 5   *  *   *   *    root       cfgbackup /etc/cfgbackup/webspace_hourly.conf run

# webspace hourly backups: runs every day at 11:30pm
 30  23 *   *   *    root       cfgbackup /etc/cfgbackup/webspace_daily.conf run

As you'll notice, each line is broken up into columns, with each column delimited by spaces or tabs. The columns do not have to line up horizontally, but it's a good idea to do so if you want to keep your crontab file readable. To keep track of tasks, it's a good idea to also add a comment about your entries describing what it does. Comments are created using the # character; anything after a # is considered a command and ignored by cron.

The columns in crontab are, in order:

  • m The minute of the hour on which to run, or * to run every minute
  • h The hour of the day (from 0 to 23) on which to run, or * to run every hour
  • dom The day of the month on which to run, or * to run every day
  • mon The month of the year on which to run, or * to run every month
  • dow The day of the week on which to run, or * to run every day
  • user Run the command as if this user running it
  • command The command to run

Personal Crontab
If you don't have access to the /etc/crontab file, you can use the crontab command to load or edit a crontab file for your user. The format is almost exactly the same as /etc/crontab except there is no user column. All commands started from a user's crontab are always run by that user's account.

Example crontab entries for a user's crontab:

# m  h  dom mon dow  command
 30  8  *   *   *    /home/gerald/scripts/morning_schedule.sh
 0   13 *   *   *    /home/gerald/scripts/afternoon_schedule.sh

# document directory rotation
 45  *  *   *   *    cfgbackup /home/gerald/cfgbackup/documents_rotate.conf run

To edit a user crontab file in-place, run crontab -e; this will allow you to edit the file in a terminal text editor.

If you already have a crontab written in another file, you can load it into your crontab by just passing it to the crontab command:

crontab load_file.txt

More Information

Backups should be remote and inaccessible
Backups should never reside on the same machine where the original files exist. Any "backups" that exist on the origin machine's disk are completely worthless if the RAID fails, file-system become corrupted, rack catches on file, etc.

Additionally, backups that are accessible by the origin machine aren't very helpful when put up against malicious actors. Any hacker/disgruntled employee who has access to the original data may attempt to sully the backups as well.

Either setup and use a remote Public Key SSH connection to grab the source files from the origin machine, or have the source machine push a single copy of its data to a remote location, and then have the backup job pull a rotation from there.

Be Careful with Hard Links
Hard linking files is great for reducing the disk spaces used; you can have dozens of files hard linked together and the data for all those files will take up the same disk space as the data for a single copy. This because they all point to the same location on disk.

If you are hard linking files between rotationals (via ROTATIONALS_HARD_LINK), you need to be aware that you should never edit/modify a hard linked file in one backup, as it will result in ALL hard linked copies being edited.

If you are hard linking identical files within a backup (via IDENTICALS_HARD_LINK), all those files will have end up with the same timestamps, ownership, and permissions. While this is not an concern when when dealing with rotation hard links, when you hard link files within the same backup, you may be discarding some useful metadata. If only the content of the files is relevant to you, then this will be of no concern. However, if timestamps, ownership, or file permissions are important, then you may not want to enable IDENTICALS_HARD_LINKS.

Manually Reseting a Job
While the simplest way to fix a failed/dead job is to use the reset command, you can also manually reset a job. To reset a job:

  • Ensure the main cfgbackup process and any child processes are killed
  • Remove the PID_FILE from the TARGET_DIR
  • For rotationals, rename the RUNNING_DIRNAME back to be the oldest backup subdir of your MAX_ROTATIONS

Special Circumstances

Following are a number of situations and how you could solve them. A large number of problems can be solved via use the the RSYNC_FLAGS option, as rsync is quite powerful.

SSH on non-standard port
To change which SSH port rsync will use, you must manually set the remote shell using the -e flag. For example, to set SSH to use port 345, you would need the following:

RSYNC_FLAGS=-e "ssh -p 345"

Backing up source directories that already contain hard links
This one is quite simple. The rsync program supports syncing hard links, but disables it by default due to the performance hit it takes to track them.

To backup hard links, all you need to do is add the -H flag to the RSYNC_FLAGS option.

RSYNC_FLAGS=-H

Limit backup bandwidth used
To limit the maximum bandwidth that can be used by the backup job when the source is coming over SSH, you can use the rsync flag --bwlimit. See the rsync manual for more info: man rsync

RSYNC_FLAGS=--bwlimit=10M

Stuck backups - job running but no files transfering
Just because you don't see files transfering doesn't mean there is a problem. The biggest benefit of rsync is that it only transfers files that need to be updated. If no files needs to be changed, then you won't see anything writing to the logs.

That said, if you truely are experiencing a lost/stuck connection, you can tell rsync to fail if no data is transfered after a given time. By setting the --timeout flag under RSYNC_OPTIONS, you can set a maximum time in seconds to wait for I/O to happen.

RSYNC_FLAGS=--timeout=3600

Only allow new files; no changes/deletions
To prevent file changes and deletions, and then have cfgbackup email you a report of prevented changes/deletions to NOTIFY_EMAIL, just set the following two options:

ALLOW_DELETIONS=0
ALLOW_OVERWRITES=0

Running on a Mac
Unfortunately, Apple ships their OS with very old version of some open source software installed. Fortunately, there are various other providers of open source software packages you can use. Each handles package management slightly differently but all can get the job done.

  • Homebrew: https://brew.sh/ Well polished and very popular amongst developers; targets individual user functionality rather than system wide use.
  • MacPorts: https://www.macports.org/ Similarities to BSD ports package manager.
  • Fink: http://www.finkproject.org/ Similarities to Debian apt package manager.

Once setup, install the required packages using the method appropriate to the package management you selected. You will likely need to specify all the paths to the tools you installed as options in the config file.

Once installed, you may need to specify the path to newer binaries. Examples:

# Homebrew example paths
RSYNC_PATH=/usr/local/bin/rsync
SORT_PATH=/usr/local/bin/gsort
# Macports example paths
RSYNC_PATH=/opt/local/bin/rsync
SORT_PATH=/opt/local/bin/sync
# Fink example paths
RSYNC_PATH=/sw/bin/rsync
SORT_PATH=/sw/bin/sort

Running on a Windows
You should be able to run cfgbackup without any problems using either Cygwin or Windows Subsystem for Linux (aka Ubuntu on Windows), as long as you install the required dependencies. However, this functionality has not been tested.

Author and License

Copyright (c) 2017 by Nathan Collins [npcollins@ gmail.com]

Released under the MIT License

About

Easy file backups; each job based on a simple config file.

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages