Skip to content

Commit

Permalink
Reviews docs + two-way sync
Browse files Browse the repository at this point in the history
  • Loading branch information
g-nardiello committed Apr 8, 2023
1 parent 2c60ad4 commit d858eb9
Show file tree
Hide file tree
Showing 3 changed files with 88 additions and 25 deletions.
5 changes: 4 additions & 1 deletion Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,10 @@ VOLUME /data
HEALTHCHECK --interval=2s --retries=1800 \
CMD stat /var/healthy.txt || exit 1

ENV S3_SYNC_FLAGS "--delete-removed"
ENV S3_GLOBAL_FLAGS ""
ENV S3_BACKUP_FLAGS ""
ENV S3_RESTORE_FLAGS ""
ENV S3_BACKUP_FINAL_FLAGS ""
ENV S3CMD_FINAL_STRATEGY "PUT"

SHELL ["/bin/bash", "-c"]
Expand Down
54 changes: 43 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,10 +56,27 @@ This will download the data from the S3 location you specify into the
container's `/data` directory. When the container shuts down, the data will be
synced back to S3.

## Configuration

Configuration options could be passed to the script by command line or by
setting related environment variable. The command line option will always override
the command line parameters.

| Variable | Description | Command Line Option | Environment Variable |
|---|---|---|---|
| Backup interval | Time interval between backups (es. "10m", "6h") | -i \| --backup-interval \<time\> | $BACKUP_INTERVAL |
| Clean restore | Restore only if local directory is empty, else the script fails | -c \| --clean-restore | $CLEAN_RESTORE |
| Two-way sync | Does backup and restore an every cycle, else only does backup. This influence S3CMD sync flags.<br> :warning: **PARTIALLY WORKING** Read more at [Two-way sync](#two-way-sync) | -t \| --two-way-sync | $TWO_WAY_SYNC |
| Final backup strategy | Sets the s3cmd final cycle strategy on shutdown signal trap. Default is "AUTO" to better preserve posix permissions, "PUT" and "SYNC" are available. | --final-strategy \<mode\> | $S3CMD_FINAL_STRATEGY |
| S3CMD sync flags | Additional flags passed to s3cmd commands. Default to `--delete-removed`, or empty if two-way sync is enabled. Configurable only by environment variable | *n/d* | $S3_GLOBAL_FLAGS |
|| Additional flags passed to s3cmd restore commands. Default empty. Configurable only by environment variable | *n/d* | $S3_RESTORE_FLAGS |
|| Additional flags passed to s3cmd backup commands. Default empty. Configurable only by environment variable | *n/d* | $S3_BACKUP_FLAGS |
|| Additional flags passed to s3cmd last backup command on gracefully stop. Default empty. Configurable only by environment variable | *n/d* | $S3_BACKUP_FINAL_FLAGS |

### Configuring a sync interval

When the `BACKUP_INTERVAL` environment variable is set, a watcher process will
sync the `/data` directory to S3 on the interval you specify. The interval can
When the `-i <time>` command line option is given or the `BACKUP_INTERVAL` environment variable is set,
a watcher process will sync the `/data` directory to S3 on the interval you specify. The interval can
be specified in seconds, minutes, hours or days (adding `s`, `m`, `h` or `d` as
the suffix):

Expand All @@ -68,8 +85,17 @@ docker run -d --name my-data-container -e BACKUP_INTERVAL=2m \
-v /home/user/.s3/.s3cfg:/root/.s3cfg \
nards/docker-s3cmd-sync /data/ s3://mybucket/someprefix
```
### Final backup strategy

A final backup will always be performed when a shutdown event is trapped. The script will traps on `SIGHUP` `SIGINT` `SIGTERM` and grants docker container backup on gracefully shutdown.

The environment variable `S3CMD_FINAL_STRATEGY` could be used to force the last sync strategy to:
- `AUTO` (default) will select the best option to keep all permissions
- `PUT` will upload again all files
- `SYNC` will sync only identificable changed files

It could be configured to optimize execution and posix permissions. Read more at [Posix file attributes persist during sync](#posix-file-attributes-persist-during-sync).

A final put will always be performed on container shutdown, to reupload all files. It could be a sync in the future (see [Posix file attributes persist during sync](#posix-file-attributes-persist-during-sync) section).

### Forcing a sync

Expand All @@ -87,6 +113,18 @@ A push can be forced by sending the container the `USR2` signal:
docker kill --signal=USR2 my-data-container
```

### Two-way sync

> :warning: **PARTIALLY WORKING**: effective two-way sync is not implemented. This is actually a workaround.
**Two-way sync** mode could be enabled by `-e TWO_WAY_SYNC="true"` or by command line. If you not enable the `TWO_WAY_SYNC` mode, if two or more folders are synchronized to same bucket/folder you cannot see modifications.

If 2 or more folders are synchronized to the same bucket/folder it needs to propagate modifications periodically from *local* to *s3* and viceversa. Normal script flow propagate from *s3* to *local* only at startup, next only from *s3* to *local* periodically. The **two-way sync** mode changes the script flow, doing `backup` and `restore` phases on each cycle, instead of only `backup` phase.

The two-way sync mode usually manages files creation and deletion, but this is not implemented on s3cmd and not supported by this library. File deletion is **disabled** by default activating the two-way sync mode and could be changed acting on **S3CMD sync flags** and on `--delete-removed` flag.

This is a **temporary solution** and will be corrected on next versions, based on a better workaround or on s3cmd updates in that way.

### Using Compose and named volumes

Most of the time, you will use this image to sync data for another container.
Expand Down Expand Up @@ -127,18 +165,12 @@ Consider to start first backup without dependencies if you evaluate the restore
Traditional S3 bucket cannot persist posix file attributes, as creation date and posix file and folder permissions.
Using S3Cmd tool we can persist posix file attributes during sync process, but actually with some limitations. The s3cmd sync command checks file dimension and MD5 hash value in order to know if a file is changed or not. So if only the file attributes changes the file will not be updated to S3 bucket.
Using S3Cmd tool we can persist posix file attributes during `sync` process, but actually with some limitations. The s3cmd sync command checks file dimension and MD5 hash value in order to know if a file is changed or not. So if only the file attributes changes the file will not be updated to S3 bucket. The s3cmd `put` command is not affected by this lack and will ever set permissions correctly.

There is an Issue opened to S3Cmd project to find a way to sync files on file attributes changing, but it is not yet resolver. Now you can force a put to all bucket to recover
There is an Issue opened to S3Cmd project to find a way to sync files on file attributes changing, but it is not yet resolved. Waiting the Issue will be solved, the script defaults force a `put` as backup shutdown strategy, to ensure all permissions will be backed up correctly. If you don't care it, you could set `sync` as final backup strategy.

You can find Issue on S3Cmd repository here: [https://github.com/s3tools/s3cmd/issues/1280](https://github.com/s3tools/s3cmd/issues/1280)

### Temporarily strategy on final sync
The environment variable `S3CMD_FINAL_STRATEGY` could be temporarily used to force the last sync strategy to:
- `PUT` (default) as the secure option to upload again all files
- `SYNC` to sync only identificable changed files

## Contributing

You could contribute to project as you like.
Expand Down
54 changes: 41 additions & 13 deletions watch
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,12 @@ function usage {
cat <<-EOF
Usage: $PROGNAME [OPTIONS] <local-path> <remote-path>
Sync s3 directory locally and backup changed files on exit
--force-restore restore even if local directory is not empty
eg: $PROGNAME /data s3://bucket/dir
-i | --backup-interval <time> set interval between backups, default 10m [env BACKUP_INTERVAL="time"]
-c | --clean-restore restore only if local directory is empty [env CLEAN_RESTORE="true"]
-t | --two-way-sync sync two ways on each cycle, default backups only [env TWO_WAY_SYNC="true"]
--final-strategy <mode> set the backup strategy on shutdown signal trap, default AUTO
(default "put" to rewrite permissions) [env S3CMD_FINAL_STRATEGY="sync"]
eg: $PROGNAME --backup-interval "5s" --two-way-sync /data/ s3://bucket/dir/
EOF
}

Expand All @@ -16,16 +20,30 @@ function error_exit {
exit 1
}

PARSED_OPTIONS=$(getopt -n "$0" -o f --long "force-restore" -- "$@")
PARSED_OPTIONS=$(getopt -n "$0" -o i:c:t --long "backup-interval:,clean-restore,two-way-sync,final-strategy:" -- "$@")
if [ $? -ne 0 ]; then
exit 1
fi
eval set -- "$PARSED_OPTIONS"

while true; do
case "$1" in
-f|--force-restore)
FORCE_RESTORE="true"
-i|--backup-interval)
BACKUP_INTERVAL=$2
shift
shift;;

-c|--clean-restore)
CLEAN_RESTORE="true"
shift;;

-t|--two-way-sync)
TWO_WAY_SYNC="true"
shift;;

--final-strategy)
S3CMD_FINAL_STRATEGY=$2
shift
shift;;

--)
Expand All @@ -39,29 +57,36 @@ LOCAL=$1
REMOTE=$2
HEALTHCHECK_FILE=/var/healthy.txt

case $S3CMD_FINAL_STRATEGY in
BACKUP_INTERVAL=${BACKUP_INTERVAL:="10m"}
TWO_WAY_SYNC=${TWO_WAY_SYNC:="false"}

if [[ ${TWO_WAY_SYNC:false} == 'false' ]]; then
S3_GLOBAL_FLAGS=${S3_GLOBAL_FLAGS:="--delete-removed"}
fi

case ${S3CMD_FINAL_STRATEGY^^} in
"SYNC") SYNC="sync" ;;
"PUT") SYNC="put" ;;
*) SYNC="put" ;;
esac

function restore {
if [ "$(ls -A $LOCAL)" ]; then
if [[ ${FORCE_RESTORE:false} == 'true' ]]; then
if [[ ${CLEAN_RESTORE:false} == 'true' ]]; then
if [ "$(ls -A $LOCAL)" ]; then
error_exit "local directory is not empty"
fi
fi

echo "restoring $REMOTE => $LOCAL [sync]"
if ! s3cmd --preserve sync "$REMOTE" "$LOCAL"; then
if ! s3cmd --preserve sync $S3_GLOBAL_FLAGS $S3_RESTORE_FLAGS "$REMOTE" "$LOCAL"; then
error_exit "restore failed"
fi
touch $HEALTHCHECK_FILE
}

function backup {
echo "backup $LOCAL => $REMOTE [sync]"
if ! s3cmd --preserve --recursive $S3_SYNC_FLAGS sync "$LOCAL" "$REMOTE"; then
if ! s3cmd --preserve --recursive $S3_GLOBAL_FLAGS $S3_BACKUP_FLAGS sync "$LOCAL" "$REMOTE"; then
echo "backup failed" 1>&2
rm $HEALTHCHECK_FILE
return 1
Expand All @@ -71,7 +96,7 @@ function backup {

function forced_backup {
echo "backup $LOCAL => $REMOTE [put]"
if ! s3cmd --preserve --recursive $S3_SYNC_FLAGS put "$LOCAL" "$REMOTE"; then
if ! s3cmd --preserve --recursive $S3_GLOBAL_FLAGS $S3_BACKUP_FLAGS put "$LOCAL" "$REMOTE"; then
echo "backup failed" 1>&2
rm $HEALTHCHECK_FILE
return 1
Expand All @@ -81,7 +106,7 @@ function forced_backup {

function final_backup {
echo "backup $LOCAL => $REMOTE [$SYNC]"
while ! s3cmd --preserve --recursive $S3_SYNC_FLAGS $SYNC "$LOCAL" "$REMOTE"; do
while ! s3cmd --preserve --recursive $S3_GLOBAL_FLAGS $S3_BACKUP_FINAL_FLAGS $SYNC "$LOCAL" "$REMOTE"; do
echo "backup failed, will retry" 1>&2
sleep 1
done
Expand All @@ -91,9 +116,12 @@ function final_backup {
function idle {
echo "ready"
while true; do
sleep ${BACKUP_INTERVAL:-42} &
sleep ${BACKUP_INTERVAL} &
wait $!
[ -n "$BACKUP_INTERVAL" ] && backup
if [[ ${TWO_WAY_SYNC:false} == 'true' ]]; then
[ -n "$BACKUP_INTERVAL" ] && restore
fi
done
}

Expand Down

0 comments on commit d858eb9

Please sign in to comment.