Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

backup_local: a role for /apps hardlink differential backups #627

Merged
merged 14 commits into from
Sep 7, 2022
7 changes: 7 additions & 0 deletions group_vars/betabarrel_cluster/vars.yml
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,13 @@ nameservers: [
'8.8.4.4', # Google DNS.
'8.8.8.8', # Google DNS.
]

local_backups: # list of folders for cron to make daily backup
- name: apps # don't modify after once deployed!
src_path: '/apps'
frequency:
- { name: 'daily', hour: '5', minute: '47', day: '*', weekday: '*', month: '*', keep: '60', disabled: 'false' }

local_admin_groups:
- 'admin'
- 'docker'
Expand Down
94 changes: 94 additions & 0 deletions roles/backup_local/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
## About

This role can create a local backup of local folder(s).

The group variables (or machine variables) use the `local_backups` for a list of
folders that needs to be backed up.

The `frequency` sublist of the `local_backups` define the frequency of backups. It can
be multiple periodical backups running at the same time on the machine (f.e. `daily`
and `weekly` backups). Per each of backup inside `local_backups`, there is a date
and time definition of when this type of backup should occur.

Cronjobs are defined and deployed, based on the naming of the backups. Make sure
that you do not deploy the backup and then redeploy it with the changed name, as
it will result in duplicated cron jobs.

Destination of the backup folder can be also a mounted directory from the remote
server.

# Backup size growth

All the backup increments are hardlinked against last backup, in order to save
space. The hardlinking mechanism uses rsync's internal mechanism to do so.

The cron will keep number of backups of individual folder, the oldest ones will
be automatically deleted upon execution of the new backup.

For each `frequency` there will be one **full** copy, **plus** the differential
backups for number of `keep` backups.

## Data structure

* Main backup folder is the main location of all the backups to be added, f.e.
`/backups/`
* a subfolder inside is the name of individual backup - f.e. if we define a
backup of `/apps` and name it `apps` there will be
`/backups/apps`
* an individual backup can have more than one frequency defined, so there will
be a subfolder for each frequency.

## Cron execution

There is a cron call for each of the backup and for each of frequency created with
this playbook automatically. It can be viewed with `crontab -l`. The crons should
not be changed manually!

## Script procedure

1. Script checks if the backup folders of source and destination exists.
2. It removes the number of past backups if there is more of them than the set number.
3. It determins the latest backup folder.
4. It creates a new backup in `main backup > name > frequency` directory, with
the hardlink to the last backups of this backup type.

## Manually executing

You can also call the backup command with:

`/root/backup_cronjob.sh etc daily 10 /etc /backups/`

this will make the backup of `/etc` folder, will put the files in

`/backups/etc/daily/YYYYmmdd_HHMMSS/etc/`

and keep all together last 10 backups.

## Logging

The list of deleted folders, and the rsync entire command (before execution) is
logged into a file /root/backup_cron.sh.log (that is `${0}.log`).
The output of command also show the hardlinking of the folders, if needed for
debugging.
Log is also rotated and kept only last 1024 lines.

## Tested

With `ansible-lint v6.5.0` and `shellcheck v0.8.0`.

## Limitations

The script itself cannot be executed more than once per second. It was not yet
tested for the backups on the external machines.

Successful backup has no output. The output appears only when errors occur. In
the current configuration, the failed backup output is reported only in the mails
for the `root` user (at the moment that is in the `/var/spool/mail/root` or
`/var/mail/root`).

Do not use this backup methods for

- the backup of the files, whose content change just slightly (f.e. appended
new line at the end of large file (f.e. backup of large database dumps)
- folder who has regular newly created files, that get later deleted (f.e. the
ones in the /tmp folder)
16 changes: 16 additions & 0 deletions roles/backup_local/defaults/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
---
# list of list of folders for cron to make daily backup
# Example values:
# local_backups: # list of folders for cron to make daily backup
# - name: apps # don't modify after once deployed!
# src_path: '/apps'
# frequency:
# - { name: 'daily', hour: '*', minute: '*', day: '*', weekday: '*', month: '*', keep: '60', disabled: 'false' }
# - { name: 'weekly', hour: '*', minute: '*/3', day: '*', weekday: '*', month: '*', keep: '20', disabled: 'false' }


# By default an empty list
local_backups: []

main_backup_folder: '/local_backups/' # where to save all the backups on the individual machine
...
5 changes: 5 additions & 0 deletions roles/backup_local/meta/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
dependencies:
- role: rsync
when: local_backups is defined and backup_clist | default([]) | length > 0
...
45 changes: 45 additions & 0 deletions roles/backup_local/tasks/crons.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
---

- name: Create main backup directory
ansible.builtin.file:
path: '{{ main_backup_folder }}'
state: directory
mode: '0755'
when: local_backups is defined and local_backups | default([]) | length > 0
become: true

- name: Create backup subdirectories for each item from list
ansible.builtin.file:
path: '{{ main_backup_folder }}/{{ item.name }}/'
state: directory
mode: '0700'
when: local_backups is defined and local_backups | default([]) | length > 0
become: true
loop: "{{ local_backups }}"

- name: 'Install backup cronjob script'
ansible.builtin.template:
src: "templates/backup_cronjob.sh.j2"
dest: '/root/backup_cronjob.sh'
owner: 'root'
group: 'root'
mode: '0700'
when: local_backups is defined and local_backups | default([]) | length > 0
become: true

- name: 'Create cron for daily backup'
ansible.builtin.cron:
name: '{{ item.0.name }}-{{ item.1.name }} - do not change it manually!'
hour: '{{ item.1.hour }}'
minute: '{{ item.1.minute }}'
month: '{{ item.1.month }}'
weekday: '{{ item.1.weekday }}'
user: 'root'
disabled: '{{ item.1.disabled | default("true") }}'
job: >
/root/backup_cronjob.sh '{{ item.0.name }}' '{{ item.1.name }}' '{{ item.1.keep }}'
'{{ item.0.src_path }}' '{{ main_backup_folder | default("/backups/", true) }}'
loop: '{{ local_backups | subelements("frequency") }}'
become: true

...
6 changes: 6 additions & 0 deletions roles/backup_local/tasks/main.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---

- name: Install and configure local backup cron
ansible.builtin.include_tasks: crons.yml

...
91 changes: 91 additions & 0 deletions roles/backup_local/templates/backup_cronjob.sh.j2
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
#!/bin/bash

# 2022-08-22 initial script
# 2022-08-23 added options: type, keepversions and destionation, shellcheck done
# 2022-08-24 added logging, debugging
# 2022-08-28 improved logging, worked on initial backup
# {% raw %}

set -e
set -u
set -o pipefail

if [ "${#}" -ne 5 ]; then
echo "${0} \"name\" \"frequencyname\" \"keep\" \"/this/will/be/backed/up\" \"/destination/path\""
echo " name [string] name of the backup (creates subdirectory main>name)"
echo " frequencyname [string] frequency name of backup (creates subdirectory main>name>frequency)"
echo " keep [number] how many backups to keep for this frequency"
echo " /source/... [/path/...] to the folder that you would like to backup"
echo " /main/bckp/dst/... [/path/...] to the *main* backup folder to keep all the backups"
exit 1
fi

# Get script arguments
backup_name="${1}"
backup_frequency="${2}"
keep_versions="${3}"
main_backup_destination="${5}"

current_time="$(date +%Y%m%d_%H%M%S)"

log_file="${main_backup_destination}/${backup_name}/log"
touch "${log_file}"
original_directory="$(pwd)"

# Check the if source directory exists and is readble
if ! test -r "${4}"; then
echo "Error, (source) directory for backing up cannot be read/does not exist:"
echo " -> ${4}"
exit 255
else
backup_source_formatted="$(cd "${4}" && pwd)"
fi

# Check if the main destination backup folder exist
if ! test -d "${main_backup_destination}/${backup_name}"; then
echo "Error, one or more (destination) directories do not exist:"
echo " -> ${main_backup_destination}/${backup_name}"
exit 255
fi

frequency_dir="${main_backup_destination}/${backup_name}/${backup_frequency}"
# Check the destination subdirectories and create them if missing
# Assemble the destination directory path, but do not create directory yet
destination_dir="${frequency_dir}/${current_time}/${backup_source_formatted}/"

test -d "${frequency_dir}" || mkdir "${frequency_dir:-}"

# Clean old logs (and keep only last 1024 lines)
log_short="$(tail -n 1024 "${log_file}")"
echo "${log_short}" > "${log_file}"

# Clean old backups
# list daily versions, sort by time, and keep only number of them, rest delete
# ! -path . suppreses '.' from output
purge_list="$(cd "${frequency_dir}" && find . -maxdepth 1 ! -path . -type d -printf '%T@ %f\n' | sort -r | cut -d' ' -f2- | tail -n +"${keep_versions}")"
if [[ ! -z "${purge_list}" ]]; then echo -e "${current_time}: purging ${purge_list}" >> "${log_file}"; fi
# ! -path . suppreses '.' from output
cd "${frequency_dir}" && find . -maxdepth 1 ! -path . -type d -printf '%T@ %f\n' | sort -r | cut -d' ' -f2- | tail -n +"$((keep_versions + 1))" | xargs -I {} rm -rf -- ./{}

# FIND LAST DAILY BACKUP
# sort by modification time, newest first
# ! -path . suppreses '.' from output
latest_backup="$(cd "${frequency_dir}" && find . -maxdepth 1 ! -path . -type d -printf '%T@ %f\n' | sort -r | cut -d' ' -f2- | head -n 1)" # take first

# Assemble the command for initial vs incremental backup
if [[ -z "${latest_backup}" ]]; then # first backup = simple copy, as no lattest directory found
backup_command="rsync -aqH --protect-args --delete ${backup_source_formatted}/ ${destination_dir}/."
else # make the incremental backup with hard-link against latest
# (link-dest has relative path against the new backup folder)
backup_command="rsync -aqH --protect-args --delete ${backup_source_formatted}/ ${destination_dir}/. --link-dest=../${latest_backup}"
fi

# Make the destination directory
mkdir -p "${destination_dir}"
# Log the backup command
echo "${current_time}: ${backup_command}" >> "${log_file}"
# Execute the command
eval "${backup_command}"

cd "${original_directory}" # return to home directory
# {% endraw %}
1 change: 1 addition & 0 deletions single_group_playbooks/cluster_part2.yml
Original file line number Diff line number Diff line change
Expand Up @@ -18,4 +18,5 @@
- lustre_client
- nfs_client
- shared_storage
- backup_local
...
6 changes: 6 additions & 0 deletions single_role_playbooks/backup_local.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
---
- hosts:
- deploy_admin_interface
roles:
- backup_local
...