Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Borg taking 2 days to complete backup as it is Replaying segments #8325

Closed
edsonsbj opened this issue Aug 6, 2024 · 11 comments · Fixed by #8332
Closed

Borg taking 2 days to complete backup as it is Replaying segments #8325

edsonsbj opened this issue Aug 6, 2024 · 11 comments · Fixed by #8332
Labels

Comments

@edsonsbj
Copy link

edsonsbj commented Aug 6, 2024

Have you checked borgbackup docs, FAQ, and open GitHub issues?

Yes

Is this a BUG / ISSUE report or a QUESTION?

BUG / ISSUE

System information. For client/server mode post info for both machines.

Your borg version (borg -V).

borg 1.4.0

Operating system (distribution) and version.

Ubuntu Server 24.04 LTS

Hardware / network configuration, and filesystems used.

ext4
rclone v.167
fuse

How much data is handled by borg?

1226579 files

Full borg commandline that lead to the problem (leave away excludes and passwords)

#!/bin/bash

SCRIPT_DIR=$(cd $(dirname "${BASH_SOURCE[0]}") && pwd)
CONFIG="$SCRIPT_DIR/BackupRestore-Nextcloud-rem.conf"

# Check if config file exists
if [ ! -f "$CONFIG" ]; then
    echo "ERROR: Configuration file $CONFIG cannot be found!"
    echo "Please make sure that a configuration file '$CONFIG' is present in the main directory of the scripts."
    echo "This file can be created automatically using the setup.sh script."
    exit 1
fi

source "$CONFIG"

# Create a log file to record command outputs
touch "$LogFile"
exec > >(tee -a "$LogFile")
exec 2>&1

# Function for error messages
errorecho() {
    cat <<< "$@" 1>&2
}

## ---------------------------------- TESTS ------------------------------ #
# Check if the script is being executed by root or with sudo
if [ $EUID -ne 0 ]; then
   echo "========== This script needs to be executed as root or with sudo. ==========" 
   exit 1
fi

# -------------------------------FUNCTIONS----------------------------------------- #
BORG_OPTS="--verbose --filter AME --list --progress --stats --show-rc --compression lz4 --exclude-caches"

# Start Rclone Mount
systemctl start borgbackup.service

borg break-lock $BORG_REPO

# Function to Nextcloud Maintenance Mode
nextcloud_enable() {
    # Enabling Maintenance Mode
	sudo -u www-data php $NextcloudConfig/occ maintenance:mode --on
}

nextcloud_disable() {
    # Disabling Nextcloud Maintenance Mode
	sudo -u www-data php $NextcloudConfig/occ maintenance:mode --off
}

# Function to WebServer Stop Start
stop_webserver() {
    # Stop Web Server
	systemctl stop $webserverServiceName
}

start_webserver() {
    # Stop Web Server
	systemctl start $webserverServiceName
}

prune() {
    info "Pruning repository"

    # Use the subcoming `prune` to keep 7 days, 4 per week and 6 per month
    # files of this machine.The prefix '{hostname}-' is very important for
    # limits PLA's operation to files in this machine and does not apply to
    # Files of other machines too:

    borg prune --list --progress --show-rc --keep-daily 7 --keep-weekly 4 --keep-monthly 6

}

# Function to backup Nextcloud settings
nextcloud_settings() {
    echo "========== Backing up Nextcloud settings $( date )... =========="
    echo ""

    nextcloud_enable
    stop_webserver

   	# Export the database.
	mysqldump --quick -n --host=localhost $NextcloudDatabase --user=$DBUser --password=$DBPassword > "$NextcloudConfig/nextclouddb.sql"

    # Backup
    borg create $BORG_OPTS ::'NextcloudConfigs-{now:%Y%m%d-%H%M}' $NextcloudConfig --exclude $NextcloudDataDir

    # Remove the database 
    rm "$NextcloudConfig/nextclouddb.sql"

    start_webserver
    nextcloud_disable
}

# Function to backup Nextcloud DATA folder
nextcloud_data() {
    # Filters for Inclusion Exclusion Borg
    BorgFilters="./nc-patterns.lst"

    # Create a file with the delete standards Borg Inclusion
    tee -a "$BorgFilters" <<EOF > /dev/null 2>&1
P sh
R /

# DO NOT LOOK IN THESE FOLDERS
! proc

# DIRECTORIES TO BE EXCLUDED FROM BACKUP  
- $NextcloudDataDir/*/files_trashbin

# DIRECTORIES FOR BACKUP
+ $NextcloudDataDir/

# DO NOT INCLUDE ANY MORE FILES
- **
EOF

    echo "========== Backing up Nextcloud DATA folder $( date )...=========="
    echo ""

    nextcloud_enable

    borg create $BORG_OPTS --patterns-from "$BorgFilters" ::'NextcloudData-{now:%Y%m%d-%H%M}'

    rm "$BorgFilters"

    nextcloud_disable
}

# Function to perform a complete Nextcloud backup
nextcloud_complete() {
    echo "========== Backing up Nextcloud $( date )... =========="
    echo ""

    nextcloud_enable
    stop_webserver

   	# Export the database.
	mysqldump --quick -n --host=localhost $NextcloudDatabase --user=$DBUser --password=$DBPassword > "$NextcloudConfig/nextclouddb.sql"

    # Backup
    borg create $BORG_OPTS ::'NextcloudFull-{now:%Y%m%d-%H%M}' $NextcloudConfig $NextcloudDataDir --exclude "$NextcloudDataDir/*/files_trashbin"

    # Remove the database
    rm "$NextcloudConfig/nextclouddb.sql"

    start_webserver
    nextcloud_disable
}

# Check if an option was passed as an argument
if [[ ! -z ${1:-""} ]]; then    # Execute the corresponding Backup option
    case $1 in
        1)
            nextcloud_settings
            ;;
        2)
            nextcloud_data
            ;;
        3)
            nextcloud_complete
            ;;
        *)
            echo "Invalid option!"
            ;;
    esac
else
    # Display the menu to choose the Backup option
    echo "Choose a Backup option:"
    echo "1. Backup Nextcloud configurations and database."
    echo "2. Backup only the Nextcloud data folder. Useful if the folder is stored elsewhere."
    echo "3. Backup Nextcloud configurations, database, and data folder."
    echo "4. To go out."

    # Read the option entered by the user
    read option

    # Execute the corresponding Backup option
    case $option in
        1)
            nextcloud_settings
            ;;
        2)
            nextcloud_data
            ;;
        3)
            nextcloud_complete
            ;;
        4)
            echo "Leaving the script."
            exit 0
            ;;
        *)
            echo "Invalid option!"
            ;;
    esac
fi

    # Sleep for 90 minutes before unmounting the drive
sleep 5400

    # Stop Rclone Mount
systemctl stop borgbackup.service

Describe the problem you're observing.

After a broken update of a program, I had to reinstall the server again and, to my surprise, since 01/06, my backups have become extremely slow, starting to take about 3 to 4 hours to complete the backup, when they complete at all.

For about a month now, things have only gotten worse, because instead of 3 to 4 hours on a given day, it gives an error that the mount is not available and ends the backup halfway through and the next day it starts the backup normally but it only finishes the next day or takes up to 2 days and within the log the information contained is that it is recreating the cache and then it starts the backup then stops halfway and starts to recreate the cache again.

I have already made several attempts, such as changing the cache storage, disabling the cache, some changes in rclone and even the cloud where the backup is stored. I have changed and nothing seems to have any effect.

Before the problem with the server, I was running Ubuntu 22.04 LTS and borg. 1.2.8

Can you reproduce the problem? If so, describe how. If not, describe troubleshooting steps you took before opening the issue.

The problem occurs when running the Borg Create command where it takes a long time in an rclone assembly, taking up to 2 to 3 days to perform a backup of just a few MB's.

Include any warning/errors/backtraces from the system logs

Log File Exceeds
@infectormp
Copy link
Contributor

the mount is not available

The possible reason can be unstable mount point. You can find more information here
https://borgbackup.readthedocs.io/en/stable/faq.html#it-always-chunks-all-my-files-even-unchanged-ones

There is also a good article on how to identify a slowness problem
https://borgbackup.readthedocs.io/en/stable/faq.html#why-is-backup-slow-for-me

And please use the latest version of borg if it is possible.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 6, 2024

@edsonsbj That's a big no-no in a script:

borg break-lock $BORG_REPO

From the docs: Please use carefully and only while no borg process (on any machine) is trying to access the Cache or the Repository.

As you automatically run that, it could be that 2 borg worked in that repository at the same time - an unsupported situation that is normally avoided by locking.

Remove that from your script(s), run borg check ... and borg check --repair ....

@edsonsbj
Copy link
Author

edsonsbj commented Aug 6, 2024

Here are some updates the Backup in question just ended up taking a total of just over 32 hours

========== Backing up Nextcloud DATA folder dom 04 ago 2024 00:00:08 -03...==========

Maintenance mode enabled
Replaying segments   0%
Replaying segments   5%
Replaying segments  10%
Replaying segments  15%
Replaying segments  20%
Replaying segments  25%
Replaying segments  30%
Replaying segments  35%
Replaying segments  40%
Replaying segments  45%
Replaying segments  50%
Replaying segments  55%
Replaying segments  60%
Replaying segments  65%
Replaying segments  70%
Replaying segments  75%
Replaying segments  80%
Replaying segments  85%
Replaying segments  90%
Replaying segments  95%

Creating archive at "/mnt/Rclone/Backup/Borg/Nextcloud/::NextcloudData-20240804-0000"
/sys/kernel/debug/tracing: file type or inode changed while we backed it up (race condition, skipped file)
0 B O 0 B C 0 B D 0 N mnt/Nextcloud/data/Edsonsb                                
Initializing cache transaction: Reading config
Initializing cache transaction: Reading chunks
Initializing cache transaction: Reading files
503.28 GB O 497.94 GB C 11.58 GB D 176621 N mnt/Nextcloud/d.../preview/3/7/a/c/b
A /mnt/Nextcloud/data/appdata_ocm1scojrdtx/preview/3/7/a/c/c/3/2/7257021/899-1599-max.jpg
Saving files cache
Saving chunks cache
Saving cache config

Initializing cache transaction: Reading config
Initializing cache transaction: Reading chunks
Initializing cache transaction: Reading files

Replaying segments   0%
Replaying segments   5%
Replaying segments  10%
Replaying segments  15%
Replaying segments  20%
Replaying segments  25%
Replaying segments  30%
Replaying segments  35%
Replaying segments  40%
Replaying segments  45%
Replaying segments  50%
Replaying segments  55%
Replaying segments  60%
Replaying segments  65%
Replaying segments  70%
Replaying segments  75%
Replaying segments  80%
Replaying segments  85%
Replaying segments  90%
Replaying segments  95%

503.28 GB O 497.94 GB C 11.58 GB D 176633 N mnt/Nextcloud/d...1/899-1599-max.jpg
A /mnt/Nextcloud/data/appdata_ocm1scojrdtx/preview/3/7/a/c/c/3/2/7257021/64-64-crop.jpg
A /mnt/Nextcloud/data/appdata_ocm1scojrdtx/preview/3/7/a/c/c/3/2/7257021/256-256-crop.jpg
A /mnt/Nextcloud/data/appdata_ocm1scojrdtx/preview/3/7/a/c/c/3/2/7257021/899-899-crop.jpg
A /mnt/Nextcloud/data/appdata_ocm1scojrdtx/preview/3/7/a/c/c/3/2/7257021/36-64.jpg
Repository: /mnt/Rclone/Backup/Borg/Nextcloud
Archive name: NextcloudData-20240804-0000
Archive fingerprint: 330089f42c3aa8b027337e0f58b22903e15043413abe7d958cf2c80d0a2c3a8b
Time (start): Mon, 2024-08-05 02:58:12
Time (end):   Tue, 2024-08-06 08:25:09
Duration: 1 days 5 hours 26 minutes 56.79 seconds
Number of files: 1368123
Utilization of max. archive size: 1%
------------------------------------------------------------------------------
                       Original size      Compressed size    Deduplicated size
This archive:                1.49 TB              1.44 TB             12.19 GB
All archives:              453.20 TB            437.37 TB              1.30 TB

                       Unique chunks         Total chunks
Chunk index:                 2476239            506174044
------------------------------------------------------------------------------
terminating with warning status, rc 1
Maintenance mode disabled

@ThomasWaldmann
Copy link
Member

"Replaying segments" shouldn't happen under normal circumstances. Either your had 2 borgs running simultaneously due to the automatically broken lock or there was some other unusual issue (server crash?).

@ThomasWaldmann
Copy link
Member

Minor issue:

/sys/kernel/debug/tracing: file type or inode changed while we backed it up (race condition, skipped file)

You should exclude sys and proc at least, maybe also dev.

@edsonsbj
Copy link
Author

edsonsbj commented Aug 8, 2024

@edsonsbjIsso é um grande não-não em um script:

borg break-lock $BORG_REPO

Dos documentos: Use com cuidado e somente quando nenhum processo borg (em nenhuma máquina) estiver tentando acessar o Cache ou o Repositório.

Como você executa isso automaticamente, pode ser que 2 borgs tenham trabalhado naquele repositório ao mesmo tempo - uma situação sem suporte que normalmente é evitada pelo bloqueio.

Remova isso do seu(s) script(s), execute borg check ...e borg check --repair ....

I add this part to the script because sometimes it is so slow that it does not release the repository and then ends up not performing the backup. As for simultaneous backups in the same repository, this does not happen because this repository would only be for Nextcloud data and only from one server.

@edsonsbj
Copy link
Author

edsonsbj commented Aug 8, 2024

Problema menor:

/sys/kernel/debug/tracing: file type or inode changed while we backed it up (race condition, skipped file)

Você deve excluir syse procpelo menos, talvez também dev.

You mean exclude from backup like --patterns-from?
My patterns file is created every time I start the script. Below is an example:

P sh
R /

# DO NOT LOOK IN THESE FOLDERS
! proc

# DIRECTORIES TO BE EXCLUDED FROM BACKUP  
- $NextcloudDataDir/*/files_trashbin

# DIRECTORIES FOR BACKUP
+ $NextcloudDataDir/

# DO NOT INCLUDE ANY MORE FILES
- **

The script I am running would be this one

https://raw.githubusercontent.com/edsonsbj/Backup-Restore-Borg-Rclone/main/scripts/Nextcloud%20%2B%20Media%20server%20/Backup.sh

@ThomasWaldmann
Copy link
Member

If a backup run on that host takes longer than expected (see issue title) and your next backup job starts before the current one ended, you will have 2 borgs writing to the same repo, corrupting the repo.

Borg's repository lock usually locks out the 2nd+ borg, but as long as your script contains borg break-lock, that safety measure does not work as intended.

@ThomasWaldmann
Copy link
Member

ThomasWaldmann commented Aug 8, 2024

To debug whether your exclude patterns work as intended, use borg create --list ... and check the output if it is backing up like desired. You don't exclude sys there (like you do with proc), so it's backing up stuff from there (that stuff is just information from the kernel, these are not on-disk files).

@edsonsbj
Copy link
Author

edsonsbj commented Aug 9, 2024

So I brought new updates yesterday I ran the backup through cron at 22:30 where it completed successfully after almost 2 hours and 35 minutes and the only thing I added to the script was --files-cache=ctime,size, but today when running the backup again the same error occurs, that is, at the time I write it is recreating the segments and the rclone assembly is working perfectly without any error in the log.

@edsonsbj
Copy link
Author

edsonsbj commented Aug 9, 2024

Follow Logs

Nextcloud-07-08-2024_22-30.txt
Nextcloud-08-08-2024_22-30.txt

Once again, to be clear, there is no possibility of running 2 Borg tasks for the same repository or even for another repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants