Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

If there are no snapshots of a zvol, 'ignore-replicated' is not honoured. #93

Closed
xrobau opened this issue Sep 10, 2021 · 8 comments
Closed

Comments

@xrobau
Copy link
Contributor

xrobau commented Sep 10, 2021

I noticed this when bootstrapping replication between two machines.

In this example, store1 is currently in the process of replicating /store2/vol3 from store2 into /store1/backups/store2/vol3, and the 'Ignoring, already replicated' is not being triggered.

When running this on store2, it thinks that the backup vol that is in the process of being received needs to be tagged.

  #### Source settings
  [Source] Using custom SSH config: ./ssh/ssh_config
  [Source] Datasets on: store1
  [Source] Keep the last 10 snapshots.
  [Source] Keep every 1 day, delete after 1 week.
  [Source] Keep every 1 week, delete after 1 month.
  [Source] Keep every 1 month, delete after 1 year.
  [Source] Selects all datasets that have property 'autobackup:replicate=true' (or children of datasets that have 'autobackup:replicate=child')

  #### Selecting
# [Source] Getting selected datasets
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -t volume,filesystem -o name,value,source -H autobackup:replicate')
  [Source] store1/backups/store2/store2/vol1: Selected
  [Source] store1/backups/store2/store2/vol2: Selected
  [Source] store1/backups/store2/store2/vol3: Selected
  [Source] store1/vol1: Selected
  [Source] store1/vol2: Selected

  #### Filtering already replicated filesystems
# [Source] store1/backups/store2/store2/vol1: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol1: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol1')
# [Source] store1/backups/store2/store2/vol1: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol1')
  [Source] store1/backups/store2/store2/vol1: Ignoring, already replicated
# [Source] store1/backups/store2/store2/vol2: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol2: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol2')
# [Source] store1/backups/store2/store2/vol2: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol2')
  [Source] store1/backups/store2/store2/vol2: Ignoring, already replicated
# [Source] store1/backups/store2/store2/vol3: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol3: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol3')
# [Source] store1/backups/store2/store2/vol3: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol3')
# [Source] store1/vol1: Checking if dataset is changed
# [Source] store1/vol1: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/vol1')
# [Source] store1/vol1: Getting zfs properties

The volume IS correctly marked as received, but my guess is that something is bailing out early saying 'This must be replicated' as there are no snapshots visible at the time?

root@store1:~# zfs get -t filesystem,volume autobackup:replicate
NAME                                                  PROPERTY              VALUE                 SOURCE
...
store1/backups/store2/store2               autobackup:replicate  -                     -
store1/backups/store2/store2/vol1          autobackup:replicate  true                  received
store1/backups/store2/store2/vol2          autobackup:replicate  true                  received
store1/backups/store2/store2/vol3          autobackup:replicate  true                  received
store1/vol1                                autobackup:replicate  true                  local
store1/vol2                                autobackup:replicate  true                  local
...
root@store1:~# zfs list -r -t snapshot store1/backups/store2/store2/vol3
no datasets available
root@store1:~# zfs list store1/backups/store2/store2/vol3
NAME                                                   USED  AVAIL     REFER  MOUNTPOINT
store1/backups/store2/store2/vol3   390G  38.4T      390G  /store1/backups/store2/store2/vol3
root@store1:~#
@xrobau
Copy link
Contributor Author

xrobau commented Sep 10, 2021

root@store1:~# zfs get -H -o property,value -p all store1/backups/store2/store2/vol3
type    filesystem
creation        1631240755
used    459289621440
available       42203169868080
referenced      459289621440
compressratio   2.05
mounted no
quota   0
reservation     0
recordsize      131072
mountpoint      /store1/backups/store2/store2/vol3
sharenfs        async,rw,crossmnt,no_subtree_check,no_root_squash
checksum        on
compression     on
atime   off
devices on
exec    on
setuid  on
readonly        off
zoned   off
snapdir hidden
aclmode discard
aclinherit      restricted
createtxg       36738
canmount        on
xattr   on
copies  1
vscan   off
nbmand  off
sharesmb        off
refquota        0
refreservation  0
guid    9555702051503981216
primarycache    all
secondarycache  all
usedbysnapshots 0
usedbydataset   459289621440
usedbychildren  0
usedbyrefreservation    0
logbias latency
objsetid        37285
dedup   off
mlslabel        none
sync    disabled
dnodesize       legacy
refcompressratio        2.05
written 459289621440
logicalused     903106373632
logicalreferenced       903106373632
volmode default
filesystem_limit        18446744073709551615
snapshot_limit  18446744073709551615
filesystem_count        18446744073709551615
snapshot_count  18446744073709551615
snapdev hidden
acltype off
context none
fscontext       none
defcontext      none
rootcontext     none
relatime        off
redundant_metadata      all
overlay on
receive_resume_token    1-12e6897589-f0-789c636064000310a500c4ec50360710e72765a52697303088b241d460c8a7a515a7968064540e2841e5d990e4932a4b528b81748698cbe64b58f497e4a79766a630303cf1e2ede66816cdf74092e704cbe725e6a63230e496e4e81697e417a51ae997e5962716a5ea97e5e7183b14a516e464262796a4ea1a1918191a581a1a181819999a18314840cd87f92b35372935253f1bcc07002af225b6
encryption      off
keylocation     none
keyformat       none
pbkdf2iters     0
special_small_blocks    0
autobackup:replicate    true
root@store1:~#

@xrobau
Copy link
Contributor Author

xrobau commented Sep 10, 2021

As expected, when the bootstrap had completed, it skipped it correctly

  #### Selecting
# [Source] Getting selected datasets
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -t volume,filesystem -o name,value,source -H autobackup:replicate')
  [Source] store1/backups/store2/store2/vol1: Selected
  [Source] store1/backups/store2/store2/vol2: Selected
  [Source] store1/backups/store2/store2/vol3: Selected
  [Source] store1/vol1: Selected
  [Source] store1/vol2: Selected

  #### Filtering already replicated filesystems
# [Source] store1/backups/store2/store2/vol1: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol1: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol1')
# [Source] store1/backups/store2/store2/vol1: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol1')
  [Source] store1/backups/store2/store2/vol1: Ignoring, already replicated
# [Source] store1/backups/store2/store2/vol2: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol2: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol2')
# [Source] store1/backups/store2/store2/vol2: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol2')
  [Source] store1/backups/store2/store2/vol2: Ignoring, already replicated
# [Source] store1/backups/store2/store2/vol3: Checking if dataset is changed
# [Source] store1/backups/store2/store2/vol3: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/backups/store2/store2/vol3')
# [Source] store1/backups/store2/store2/vol3: Getting zfs properties
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs get -H -o property,value -p all store1/backups/store2/store2/vol3')
  [Source] store1/backups/store2/store2/vol3: Ignoring, already replicated
# [Source] store1/vol1: Checking if dataset is changed
# [Source] store1/vol1: Checking if filesystem exists
# [Source] CMD    > (ssh -F ./ssh/ssh_config store1 'zfs list store1/vol1')
# [Source] store1/vol1: Getting zfs properties
...

And this is the output of zfs get all now:

root@store1:~# zfs get -H -o property,value -p all store1/backups/store2/store2/vol3
type    filesystem
creation        1631240755
used    533340812160
available       42128966090640
referenced      533340812160
compressratio   2.06
mounted no
quota   0
reservation     0
recordsize      131072
mountpoint      /store1/backups/store2/store2/vol3
sharenfs        async,rw,crossmnt,no_subtree_check,no_root_squash
checksum        on
compression     on
atime   off
devices on
exec    on
setuid  on
readonly        off
zoned   off
snapdir hidden
aclmode discard
aclinherit      restricted
createtxg       36738
canmount        on
xattr   on
copies  1
version 5
utf8only        off
normalization   none
casesensitivity sensitive
vscan   off
nbmand  off
sharesmb        off
refquota        0
refreservation  0
guid    9555702051503981216
primarycache    all
secondarycache  all
usedbysnapshots 0
usedbydataset   533340812160
usedbychildren  0
usedbyrefreservation    0
logbias latency
objsetid        37285
dedup   off
mlslabel        none
sync    disabled
dnodesize       legacy
refcompressratio        2.06
written 0
logicalused     1048560894976
logicalreferenced       1048560894976
volmode default
filesystem_limit        18446744073709551615
snapshot_limit  18446744073709551615
filesystem_count        18446744073709551615
snapshot_count  18446744073709551615
snapdev hidden
acltype off
context none
fscontext       none
defcontext      none
rootcontext     none
relatime        off
redundant_metadata      all
overlay on
encryption      off
keylocation     none
keyformat       none
pbkdf2iters     0
special_small_blocks    0
autobackup:replicate    true
root@store1:~#

@xrobau
Copy link
Contributor Author

xrobau commented Sep 10, 2021

After poking through the code, this is caused by is_changed being run, which looks at written (which is zero after the bootstrap has completed). I would have thought the code should look at SOURCE?

self.set_title("Filtering already replicated filesystems")

@psy0rz
Copy link
Owner

psy0rz commented Sep 13, 2021

You expected it to look at the received property, however thats not always a reliable indication that something is part of active replication. (Since it could be a backup that was restored for example)

zfs-autobackup is only looking at the bytes written since the last snapshot to determine if something is part of active replication.

In your case there doesnt seem to be any active replication since there are no snapshots?

So you could choose to disable those datasets via the autobackup:.. property, or you should create one manual snapshot for those datasets. (with a different name than zfs-autobackup uses as snapshot name)

Would that solve your problem in a acceptable way?

@xrobau
Copy link
Contributor Author

xrobau commented Sep 13, 2021

In your case there doesnt seem to be any active replication since there are no snapshots?

It was the FIRST snapshot, as it was bootstrapping.

Since it could be a backup that was restored for example

Yes, that is ALSO something that should not be replicated, if I say --ignore-replicated I would think.

@xrobau
Copy link
Contributor Author

xrobau commented Sep 13, 2021

Would you accept a pull request of --really-ignore-replicated at least?

@xrobau
Copy link
Contributor Author

xrobau commented Sep 15, 2021

This just bit me AGAIN, even after replication.

@xrobau
Copy link
Contributor Author

xrobau commented Sep 15, 2021

I can't seem to reopen this ticket, so I'll create a new one.

xrobau added a commit to xrobau/zfs_autobackup that referenced this issue Sep 15, 2021
If another zfs_autobackup session is running, there will be changes
on zvols that are replicated. This means that it is possible for
a pair of servers running replication between themselves to start
backing up the backups.  Turning on --exclude-received when backups
are running between an a <--> b pair of servers means that the vols
that are received will never be accidentally seleted.
psy0rz added a commit that referenced this issue Sep 16, 2021
Fix #93, Fix #95 Re-Document --exclude-received
psy0rz added a commit that referenced this issue Sep 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants