Skip to content

Commit

Permalink
Adding curl and md5 loops
Browse files Browse the repository at this point in the history
  • Loading branch information
JS Fillman committed Apr 3, 2024
1 parent b4d7aa4 commit 81959b7
Show file tree
Hide file tree
Showing 3 changed files with 145 additions and 27 deletions.
5 changes: 3 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
FROM alpine:latest as builder

RUN apk update && \
apk add --no-cache p7zip rclone
apk add --no-cache curl p7zip rclone

# Create config directory
RUN mkdir -p /root/.config/rclone/
Expand All @@ -15,8 +15,9 @@ COPY app/ /app/
FROM alpine:latest

# Copy necessary binaries and files from builder stage
COPY --from=builder /usr/bin/rclone /usr/bin/rclone
COPY --from=builder /usr/bin/7z /usr/bin/7z
COPY --from=builder /usr/bin/curl /usr/bin/curl
COPY --from=builder /usr/bin/rclone /usr/bin/rclone
COPY --from=builder /root/.config/rclone/rclone.conf /root/.config/rclone/rclone.conf
COPY --from=builder /app/ /app/

Expand Down
44 changes: 44 additions & 0 deletions app/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@

## Zip2Cloud

A robust zip & upload utility for sending archives to a remote location.

### Features

- Intelligently compares local & remote files with md5 sums
- Only uploads _completed_ archives
- Only deletes local files once they have been successfully uploaded
- Allows keeping an arbitrary amount of zipped & unzipped backups locally for faster restore
- Script only zips & uploads files that are missing from the remote location
- Allows mixing backup files with other data
- Only zips folders under the `$DUMP_BASE` directory with a date-based name e.g. `2024-04-01`
- Notifies on completion or error via Slack

### Operation of `zip2cloud`

- Uses `rclone` to create a list of `.7z` & `.md5` files from the remote location defined with the `REMOTE` environment variable
- For each file in the list

- Compares file names & md5 sums between local & remote locations prior to read/write operations
- Uploads any `.7z` files that are missing from the remote location
- Files with mismatched md5 sums are uploaded with alternate filenames
- Only deletes files locally once they have been successfully uploaded & md5 sums confirmed
- Allows multiple unzipped local backups to remain, without re-zipping & uploading
- This allows for faster restores, as we can avoid downloading the most recent archives
-

1. Creates 7zip archives of any directories under the `$DUMP_BASE` with a date-based name
- For example, if `$DUMP_BASE` is `/dump/full_backup`, the directory `2024-04-01` will
2. Syncs the archives to a remote location using rclone

### Variables

- `DUMP_BASE` - The base directory for backup dumps (default `/dump`)
- `DUMP_RETENTION` - The number of days to keep uncompressed backups locally
- `REMOTE` - The remote location to sync backups to
- `SECRET` - The encryption key for 7zip
- `SLACK_CHANNEL` - The slack channel to send notifications to
- `SLACK_WEBHOOK` - The webhook URL for slack notifications
- `ZIP_BASE` - The base filename, minus date, for the compressed backups
- `ZIP_DIR` - The directory to store all compressed backups (default `/zip`)
- `ZIP_RETENTION` - The number of days to keep compressed backups locally
123 changes: 98 additions & 25 deletions app/zip2cloud
Original file line number Diff line number Diff line change
Expand Up @@ -8,38 +8,79 @@
# sychan@lbl.gov
# 5/21/2021

# Directory containing db dumps to be archived/compressed/copied
#DUMP_BASE=/Users/jsfillman/Documents/repos/jsfillman-github/tmp-backup-test
## Variables
COMPRESSION_LEVEL=0 # Set to 0 if the db dumps are already compressed
DUMP_BASE=/dump/full_backup

# Directory to put the zipped backups
#ZIP_DIR=/Users/jsfillman/Documents/repos/jsfillman-github/tmp-backup-zip
ZIP_DIR=/zip

NOW=$(/bin/date +"%Y%m%d%H%M")

# Name of the zip'ed db backup. The .7z extension wil be added by the 7zip program

DUMP_RETENTION=3
REMOTE=remote:${BUCKET}/${BUCKETPATH}
SECRET=`cat /run/secrets/encryption_key`
SLACK_CHANNEL=''
SLACK_WEBHOOK=''
ZIP_BASE=backup_full
#ZIP_NAME=${ZIP_BASE}${NOW}
ZIP_DIR=/zip
ZIP_RETENTION=4

[ -r /run/secrets/encryption_key ] || { echo "Encryption key not readable in /run/secrets/encryption_key" ; exit 1; }
[ -r /run/secrets/gcp_backup_creds ] || { echo "Google cloud service credentials not found in /run/secrets/gcp_back_creds" ; exit 1; }
[ -z "${BUCKET}" ] && { echo "S3 bucketname not set in BUCKET environment variable" ; exit 1; }
[ -z "${BUCKETPATH}" ] && { echo "Path within S3 bucket not set in BUCKETPATH environment variable" ; exit 1; }
[ -z "${DELETE_DUMP}" ] || echo "DELETE_DUMP set, will delete files/directories under /dump/ when done compressing"

## This is the password used to generate the AES256 encryption key
#SECRET=tempsecret
SECRET=`cat /run/secrets/encryption_key`
#
## This is the Google Cloud Storage path, note that it depends on rclone being preconfigured
## for "remote" using the runtime creds, check rclone config in /root/.config/rclone/rclone.conf
REMOTE=remote:${BUCKET}/${BUCKETPATH}

# Delete any files older than 30 days in the zip directory
echo "Deleting database archives older than 30 days"
/usr/bin/find ${ZIP_DIR} -mtime +30 -type f -name "${ZIP_BASE}*" -print -exec rm {} \;
#echo "Deleting database archives older than 30 days"
#/usr/bin/find ${ZIP_DIR} -mtime +30 -type f -name "${ZIP_BASE}*" -print -exec rm {} \;

# Delete all old backups, except the last #, as defined by $ZIP_RETENTION
ls -t ${ZIP_DIR}/${ZIP_BASE}*.{7z,md5} | tail -n +$((${ZIP_RETENTION} + 1)) | xargs rm -f

# Get list of remote backups
remote_files=$(rclone ls remote:${BUCKET}/${BUCKETPATH} | grep 7z | awk '{print $2}' | rev | cut -d. -f2- | rev)
# Pull remote md5 sums for each remote backup into `tmp_md5` directory
mkdir -p ${ZIP_DIR}/${ZIP_BASE}/tmp_md5 && cd $_
for file in $remote_files; do
rclone md5sum remote:${BUCKET}/${BUCKETPATH}/$file.7z | awk '{print $1}' > ${ZIP_DIR}/${ZIP_BASE}/tmp_md5/$file.md5
done

# Create empty list of files to upload
uploads=""

# Create md5 sums for local backups, if they don't exist
cd ${ZIP_DIR}/${ZIP_BASE}
for file in ${ZIP_DIR}/${ZIP_BASE}/*.7z; do
# Get the base name of the file without extension
base_name=$(basename "$file" .7z)
# If a local .md5 file does not exist, create it
if [ ! -f "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5" ]; then
echo "Local md5 file does not exist for $file, generating, and adding $file to uploads list"
uploads="$uploads $file"
local_md5=$(md5sum "$file" | awk '{print $1}')
echo $local_md5 > "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5"
fi
done


# Verify & update list of files to upload
cd ${ZIP_DIR}/${ZIP_BASE}
for file in ${ZIP_DIR}/${ZIP_BASE}/*.7z; do
# Get the base name of the file without extension
base_name=$(basename "$file" .7z)
# Check if the remote md5 file exists
if [ ! -f "${ZIP_DIR}/${ZIP_BASE}/tmp_md5/${base_name}.md5" ]; then
# If the remote md5 file does not exist, add the file to the uploads list
echo "Remote does not exist for $file, adding $file to uploads list"
uploads="$uploads $file"
else
# Compare local and remote md5
remote_md5=$(cat "${ZIP_DIR}/${ZIP_BASE}/tmp_md5/${base_name}.md5")
local_md5=$(cat "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5")
if [ "$local_md5" != "$remote_md5" ]; then
echo "MD5 mismatch for file $file, adding to uploads list"
uploads="$uploads $file"
fi
fi
echo "Uploads: $uploads"
done


echo "Zipping ${DUMP_BASE}/${DUMP_DIR} to ${ZIP_DIR}/${ZIP_NAME}"

Expand All @@ -50,10 +91,11 @@ for DUMP_DIR in $(ls -d ${DUMP_BASE}/*/); do
ZIP_NAME=${ZIP_DIR}/${ZIP_BASE}_${DIR_NAME}

echo "Zipping ${DUMP_DIR} to ${ZIP_NAME}"
/usr/bin/7za a -p${SECRET} ${ZIP_NAME} -mx=0 -mhe -t7z ${DUMP_DIR} || { echo "Could not zip ${DUMP_DIR} into ${ZIP_NAME}" ; exit 1; }
/usr/bin/7za a -p${SECRET} ${ZIP_NAME} -mx=${COMPRESSION_LEVEL} -mhe -t7z ${DUMP_DIR} || { echo "Could not zip ${DUMP_DIR} into ${ZIP_NAME}" ; exit 1; }
# Add to list
done

## Sync All Resulting Files
## Sync All Resulting Files (in list!)
cd ${ZIP_DIR}
for file in ${ZIP_DIR}/*; do
echo "RClone-ing ${file} to GCP ${GCP_DEST}"
Expand All @@ -62,4 +104,35 @@ done

## Create a block that, upon success of rclone above, delete _only_ files that were uploaded
## For each $FILE.7z in $ZIP_DIR, do a "rm -rf $DUMP_BASE/$FILE" to remove the original dump
#[ -z "${DELETE_DUMP}" ] || { echo "Clearing contents of /dump/"; cd /dump/; rm -rf *; }
#[ -z "${DELETE_DUMP}" ] || { echo "Clearing contents of /dump/"; cd /dump/; rm -rf *; }


## -- Cruft --
#cd ${ZIP_DIR}/${ZIP_BASE}
#uploads=""
#for file in ${ZIP_DIR}/${ZIP_BASE}/*.7z; do
# # Get the base name of the file without extension
# base_name=$(basename "$file" .7z)
# # Check if the remote md5 file exists
# if [ ! -f "${ZIP_DIR}/${ZIP_BASE}/tmp_md5/${base_name}.md5" ]; then
# # If the remote md5 file does not exist, add the file to the uploads list
# uploads="$uploads $file"
# else
# # Compare local and remote md5
# remote_md5=$(cat "${ZIP_DIR}/${ZIP_BASE}/tmp_md5/${base_name}.md5")
# local_md5=$(cat "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5")
#
# if [ "$local_md5" != "$remote_md5" ]; then
# echo "MD5 mismatch for file $file"
# fi
#done

# Loop over all .7z files in ZIP_DIR
#for file in ${ZIP_DIR}/${ZIP_BASE}/*.7z; do
# # Get the base name of the file without extension
# base_name=$(basename "$file" .7z)
# # If the corresponding .md5 file does not exist, create it
# if [ ! -f "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5" ]; then
# md5sum "$file" | awk '{print $1}' > "${ZIP_DIR}/${ZIP_BASE}/${base_name}.md5"
# fi
#done

0 comments on commit 81959b7

Please sign in to comment.