Skip to content

ultrasites/data-backup-s3

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data-backup-s3

Data file backup to S3 instances!

Overview

This is the easiest way to backup your files to s3 based von python running in a docker container.

The following features are available:

  • backup data with the given volume
  • automatic upload to S3 (e.g. MinIO, AWS, ...)
  • connect to any container running on the same system
  • select how often to run a backup
  • select when to start the first backup, whether time of day or relative to container start time

Backup

To run a backup, launch data-backup-s3 image as a container with the correct parameters. Everything is controlled by environment variables passed to the container.

For example:

docker run -d --restart=always -e BACKUP_FREQ=60 -e BACKUP_BEGIN=2330 -e S3_TARGET=s3://... -v /local/file/path:/home/backupuser/data ultrasites/data-backup-s3

The above will run a backup every 60 minutes, beginning at the next 2330 local time.

Environment

  • BACKUP_FREQ: How often to do a BACKUP, in minutes. Defaults to 1440 minutes, or once per day.
  • BACKUP_BEGIN: What time to do the first BACKUP. Defaults to immediate. Must be in one of two formats:
    • Absolute: HHMM, e.g. 2330 or 0415
    • Relative: +MM, i.e. how many minutes after starting the container, e.g. +0 (immediate), +10 (in 10 minutes), or +90 in an hour and a half
  • BACKUP_CRON: Set the BACKUP schedule using standard crontab syntax, a single line.
  • RUN_ONCE: Run the backup once and exit if RUN_ONCE is set. Useful if you use an external scheduler (e.g. as part of an orchestration solution like Cattle or Docker Swarm or kubernetes cron jobs) and don't want the container to do the scheduling internally. If you use this option, all other scheduling options, like BACKUP_FREQ and BACKUP_BEGIN and BACKUP_CRON, become obsolete.
  • BACKUP_DEBUG: If set to true, print copious shell script messages to the container log. Otherwise only basic messages are printed.
  • S3_TARGET: S3 target with the format s3://bucketname/path. Connection via awscli.
  • AWS_ACCESS_KEY_ID: AWS Key ID
  • AWS_SECRET_ACCESS_KEY: AWS Secret Access Key
  • AWS_DEFAULT_REGION: Region in which the bucket resides
  • AWS_ENDPOINT_URL: Specify an alternative endpoint for s3 interopable systems e.g. Digitalocean
  • AWS_CLI_OPTS: Additional arguments to be passed to the aws part of the aws s3 cp command, click here for a list. Be careful, as you can break something!
  • AWS_CLI_S3_CP_OPTS: Additional arguments to be passed to the s3 cp part of the aws s3 cp command, click here for a list. If you are using AWS KMS, sse, sse-kms-key-id, etc., may be of interest.
  • COMPRESSION: Compression to use. Supported are: gzip (default), bzip2
  • TMP_PATH: tmp directory to be used during backup creation and other operations. Optional, defaults to /tmp

Scheduling

There are several options for scheduling how often a backup should run:

  • RUN_ONCE: run just once and exit.
  • BACKUP_FREQ and BACKUP_BEGIN: run every x minutes, and run the first one at a particular time.
  • BACKUP_CRON: run on a schedule. Cron Scheduling If a cron-scheduled backup takes longer than the beginning of the next backup window, it will be skipped. For example, if your cron line is scheduled to backup every hour, as follows:

0 * * * * And the backup that runs at 13:00 finishes at 14:05, the next backup will not be immediate, but rather at 15:00.

The cron algorithm is as follows: after each backup run, calculate the next time that the cron statement will be true and schedule the backup then.

Order of Priority

The scheduling options have an order of priority:

RUN_ONCE runs once, immediately, and exits, ignoring everything else. BACKUP_CRON: runs according to the cron schedule, ignoring BACKUP_FREQ and BACKUP_BEGIN. BACKUP_FREQ and BACKUP_BEGIN: if nothing else is set.

Permissions

By default, the backup/restore process does not run as root (UID O). Whenever possible, you should run processes (not just in containers) as users other than root. In this case, it runs as username appuser with UID/GID 1005.

In most scenarios, this will not affect your backup process negatively. However, if you are using the "Local" BACKUP target, i.e. your S3_TARGET starts with / - and, most likely, is a volume mounted into the container - you can run into permissions issues. For example, if your mounted directory is owned by root on the host, then the backup process will be unable to write to it.

In this case, you have two options:

Run the container as root, docker run --user 0 ... or, in docker-compose.yml, user: "0" Ensure your mounted directory is writable as UID or GID 1005.

Forrmat

The BACKUP target is where you want the backup files to be saved. The backup file always is a compressed file the following format:

data_backup_YYYY-MM-DDTHH:mm:ssZ.compression

Where the date is RFC3339 date format, excluding the milliseconds portion.

YYYY = year in 4 digits MM = month number from 01-12 DD = date for 01-31 HH = hour from 00-23 mm = minute from 00-59 ss = seconds from 00-59 T = literal character T, indicating the separation between date and time portions Z = literal character Z, indicating that the time provided is UTC, or "Zulu" compression = appropriate file ending for selected compression, one of: gz (gzip, default); bz2 (bzip2) The time used is UTC time at the moment the BACKUP begins.

If you use a URL like s3://bucket/path, you can have it save to an S3 bucket.

Note that for s3, you'll need to specify your AWS credentials and default AWS region via AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY and AWS_DEFAULT_REGION

About us

© 2022 Ultra Sites Medienagentur

https://www.ultra-sites.de

Visit us on Github!