Update README.md

2019-11-10 01:16:48 +01:00 · 2019-11-10 01:01:08 +01:00 · 2019-11-10 00:51:20 +01:00 · 2019-10-19 14:54:10 +02:00 · 2019-10-19 14:53:31 +02:00 · 2019-10-17 10:42:54 +02:00
2 changed files with 367 additions and 206 deletions
--- a/README.md
+++ b/README.md
@ -1,12 +1,18 @@
 # ZFS autobackup

+(checkout v3.0-beta for the new cool stuff: https://github.com/psy0rz/zfs_autobackup/blob/v3/README.md)
+
+Official releases: https://github.com/psy0rz/zfs_autobackup/releases
+
 Introduction
 ============

 ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. This is done using the very effcient zfs send and receive commands.

 It has the following features:
-* Automaticly selects filesystems to backup by looking at a simple ZFS property. (recursive)
+* Works across operating systems: Tested with Linux, FreeBSD/FreeNAS and SmartOS.
+* Works in combination with existing replication systems. (Like Proxmox HA)
+* Automatically selects filesystems to backup by looking at a simple ZFS property. (recursive)
 * Creates consistent snapshots. (takes all snapshots at once, atomic.)
 * Multiple backups modes:
  * "push" local data to a backup-server via SSH.
@ -16,34 +22,37 @@ It has the following features:
 * Supports resuming of interrupted transfers. (via the zfs extensible_dataset feature)
 * Backups and snapshots can be named to prevent conflicts. (multiple backups from and to the same filesystems are no problem)
 * Always creates a new snapshot before starting.
-* Checks everything and aborts on errors.
+* Checks everything but tries continue on non-fatal errors when possible. (Reports error-count when done)
 * Ability to 'finish' aborted backups to see what goes wrong.
 * Easy to debug and has a test-mode. Actual unix commands are printed.
 * Keeps latest X snapshots remote and locally. (default 30, configurable)
+* Uses zfs-holds on important snapshots so they cant be accidentally destroyed.
 * Easy installation:
  * Only one host needs the zfs_autobackup script. The other host just needs ssh and the zfs command.
-  * Written in python and uses zfs-commands, no 3rd party dependencys or libraries.
-  * No seperate config files or properties. Just one command you can copy/paste in your backup script.
+  * Written in python and uses zfs-commands, no 3rd party dependency's or libraries.
+  * No separate config files or properties. Just one command you can copy/paste in your backup script.

 Usage
 ====
 ```
 usage: zfs_autobackup [-h] [--ssh-source SSH_SOURCE] [--ssh-target SSH_TARGET]
                      [--keep-source KEEP_SOURCE] [--keep-target KEEP_TARGET]
-                      [--no-snapshot] [--no-send] [--resume]
-                      [--strip-path STRIP_PATH] [--destroy-stale]
+                      [--no-snapshot] [--no-send] [--allow-empty]
+                      [--ignore-replicated] [--no-holds] [--ignore-new]
+                      [--resume] [--strip-path STRIP_PATH] [--buffer BUFFER]
                      [--clear-refreservation] [--clear-mountpoint]
                      [--filter-properties FILTER_PROPERTIES] [--rollback]
-                      [--test] [--verbose] [--debug]
-                      backup_name target_fs
+                      [--ignore-transfer-errors] [--test] [--verbose]
+                      [--debug]
+                      backup_name target_path

-ZFS autobackup v2.2
+ZFS autobackup v2.4

 positional arguments:
  backup_name           Name of the backup (you should set the zfs property
                        "autobackup:backup-name" to true on filesystems you
                        want to backup
-  target_fs             Target filesystem
+  target_path           Target path

 optional arguments:
  -h, --help            show this help message and exit
@ -62,6 +71,14 @@ optional arguments:
  --no-snapshot         dont create new snapshot (usefull for finishing
                        uncompleted backups, or cleanups)
  --no-send             dont send snapshots (usefull to only do a cleanup)
+  --allow-empty         if nothing has changed, still create empty snapshots.
+  --ignore-replicated   Ignore datasets that seem to be replicated some other
+                        way. (No changes since lastest snapshot. Usefull for
+                        proxmox HA replication)
+  --no-holds            Dont lock snapshots on the source. (Usefull to allow
+                        proxmox HA replication to switches nodes)
+  --ignore-new          Ignore filesystem if there are already newer snapshots
+                        for it on the target (use with caution)
  --resume              support resuming of interrupted transfers by using the
                        zfs extensible_dataset feature (both zpools should
                        have it enabled) Disadvantage is that you need to use
@ -71,8 +88,9 @@ optional arguments:
  --strip-path STRIP_PATH
                        number of directory to strip from path (use 1 when
                        cloning zones between 2 SmartOS machines)
-  --destroy-stale       Destroy stale backups that have no more snapshots. Be
-                        sure to verify the output before using this!
+  --buffer BUFFER       Use mbuffer with specified size to speedup zfs
+                        transfer. (e.g. --buffer 1G) Will also show nice
+                        progress output.
  --clear-refreservation
                        Set refreservation property to none for new
                        filesystems. Usefull when backupping SmartOS volumes.
@ -86,11 +104,17 @@ optional arguments:
                        from Linux to FreeNAS, you should filter xattr)
  --rollback            Rollback changes on the target before starting a
                        backup. (normally you can prevent changes by setting
-                        the readonly property on the target_fs to on)
+                        the readonly property on the target_path to on)
+  --ignore-transfer-errors
+                        Ignore transfer errors (still checks if received
+                        filesystem exists. usefull for acltype errors)
  --test                dont change anything, just show what would be done
                        (still does all read-only operations)
  --verbose             verbose output
  --debug               debug output (shows commands that are executed)
+
+When a filesystem fails, zfs_backup will continue and report the number of
+failures at that end. Also the exit code will indicate the number of failures.
 ```

 Backup example
@ -200,7 +224,7 @@ Host smartos04
    Compression yes
 ```

-This way you can just specify smartos04
+This way you can just specify "smartos04" as host.

 Also uses compression on slow links.

@ -236,8 +260,6 @@ root@fs1:/home/psy#  zfs send fs1/zones/backup/zfsbackups/smartos01.server.com/z
 After that you can rename the disk image from the temporary location to the location of a new SmartOS machine you've created.


-
-
 Monitoring with Zabbix-jobs
 ===========================

@ -249,3 +271,55 @@ zabbix-job-status backup_smartos01_fs1 daily $?
 ```

 This will update the zabbix server with the exitcode and will also alert you if the job didnt run for more than 2 days.
+
+
+Backuping up a proxmox cluster with HA replication
+==================================================
+
+Due to the nature of proxmox we had to make a few enhancements to zfs_autobackup. This will probably also benefit other systems that use their own replication in combination with zfs_autobackup.
+
+All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs_autobackup will be fine. (and it will get the next backup from the new node automaticly)
+
+
+In the example below we have 3 nodes, named h4, h5 and h6.
+
+The backup will go to a machine named smartos03.
+
+Preparing the proxmox nodes
+---------------------------
+
+On each node select the filesystems as following:
+```
+root@h4:~# zfs set autobackup:h4_smartos03=true rpool
+root@h4:~# zfs set autobackup:h4_smartos03=false rpool/data
+root@h4:~# zfs set autobackup:data_smartos03=child rpool/data
+
+```
+
+* rpool will be backuped the usual way, and is named h4_smartos03. (each node will have a unique name)
+* rpool/data will be excluded from the usual backup
+* The CHILDREN of rpool/data be selected for a cluster wide backup named data_smartos03. (each node uses the same backup name)
+
+
+Preparing the backup server
+---------------------------
+
+Extra options needed for proxmox with HA:
+* --no-holds: To allow proxmox to destroy our snapshots if a VM migrates to another node.
+* --ignore-replicated: To ignore the replicated filesystems of proxmox on the receiving proxmox nodes. (e.g: only backup from the node where the VM is active)
+
+
+I use the following backup script on the backup server:
+```
+for H in h4 h5 h6; do
+  echo "################################### DATA $H"
+  #backup data filesystems to a common place
+  ./zfs_autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint  --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --no-holds $@
+  zabbix-job-status backup_$H""_data_smartos03 daily $? >/dev/null 2>/dev/null
+
+  echo "################################### RPOOL $H"
+  #backup rpool to own place
+  ./zfs_autobackup --ssh-source root@$H $H""_smartos03 zones/backup/zfsbackups/$H --verbose --clear-refreservation --clear-mountpoint  --resume --ignore-transfer-errors $@
+  zabbix-job-status backup_$H""_smartos03 daily $? >/dev/null 2>/dev/null
+done
+```
--- a/467
+++ b/467
@ -1,5 +1,9 @@
 #!/usr/bin/env python2
 # -*- coding: utf8 -*-
+
+#(C)edwin@datux.nl -- Edwin Eefting
+#Release under GPL.
+
 from __future__ import print_function
 import os
 import sys
@ -13,18 +17,20 @@ import time
 def error(txt):
    print(txt, file=sys.stderr)

-
-
 def verbose(txt):
    if args.verbose:
        print(txt)

-
-
 def debug(txt):
    if args.debug:
        print(txt)

+#fatal abort execution, exit code 255
+def abort(txt):
+    error(txt)
+    sys.exit(255)
+
+

 """run a command. specifiy ssh user@host to run remotely"""
 def run(cmd, input=None, ssh_to="local", tab_split=False, valid_exitcodes=[ 0 ], test=False):
@ -95,7 +101,7 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
    for source_filesystem in source_filesystems:
        (name,value,source)=source_filesystem
        if value=="false":
-            verbose("Ignored : {0} (disabled)".format(name))
+            verbose("* Ignored : {0} (disabled)".format(name))

        else:
            if source=="local" and ( value=="true" or value=="child"):
@ -103,16 +109,16 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):

            if source=="local" and value=="true":
                selected_filesystems.append(name)
-                verbose("Selected: {0} (direct selection)".format(name))
+                verbose("* Selected: {0} (direct selection)".format(name))
            elif source.find("inherited from ")==0 and (value=="true" or value=="child"):
                inherited_from=re.sub("^inherited from ", "", source)
                if inherited_from in direct_filesystems:
                    selected_filesystems.append(name)
-                    verbose("Selected: {0} (inherited selection)".format(name))
+                    verbose("* Selected: {0} (inherited selection)".format(name))
                else:
-                    verbose("Ignored : {0} (already a backup)".format(name))
+                    verbose("* Ignored : {0} (already a backup)".format(name))
            else:
-                verbose("Ignored : {0} (only childs)".format(name))
+                verbose("* Ignored : {0} (only childs)".format(name))

    return(selected_filesystems)

@ -123,7 +129,6 @@ def zfs_get_resumable_filesystems(ssh_to, filesystems):
    cmd=[ "zfs", "get", "-t",  "volume,filesystem", "-o", "name,value", "-H", "receive_resume_token" ]
    cmd.extend(filesystems)

-    #TODO: get rid of ugly errors for non-existing target filesystems
    resumable_filesystems=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes= [ 0,1 ] )

    ret={}
@ -177,12 +182,12 @@ def zfs_create_snapshot(ssh_to, filesystems, snapshot):
            cmd.append(filesystem+"@"+snapshot)

            #in testmode we dont actually make changes, so keep them in a list to simulate
-            if args.test:
-                if not ssh_to in test_snapshots:
-                    test_snapshots[ssh_to]={}
-                if not filesystem in test_snapshots[ssh_to]:
-                    test_snapshots[ssh_to][filesystem]=[]
-                test_snapshots[ssh_to][filesystem].append(snapshot)
+            # if args.test:
+            #     if not ssh_to in test_snapshots:
+            #         test_snapshots[ssh_to]={}
+            #     if not filesystem in test_snapshots[ssh_to]:
+            #         test_snapshots[ssh_to][filesystem]=[]
+            #     test_snapshots[ssh_to][filesystem].append(snapshot)

        run(ssh_to=ssh_to, tab_split=False, cmd=cmd, test=args.test)

@ -196,13 +201,12 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
    ret={}

    if filesystems:
-        #TODO: get rid of ugly errors for non-existing target filesystems
        cmd=[
            "zfs", "list", "-d", "1", "-r", "-t" ,"snapshot", "-H", "-o", "name"
        ]
        cmd.extend(filesystems)

-        snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0,1 ])
+        snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])


        for snapshot in snapshots:
@ -213,13 +217,13 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
                ret[filesystem].append(snapshot_name)

        #also add any test-snapshots that where created with --test mode
-        if args.test:
-            if ssh_to in test_snapshots:
-                for filesystem in filesystems:
-                    if filesystem in test_snapshots[ssh_to]:
-                        if not filesystem in ret:
-                            ret[filesystem]=[]
-                        ret[filesystem].extend(test_snapshots[ssh_to][filesystem])
+        # if args.test:
+        #     if ssh_to in test_snapshots:
+        #         for filesystem in filesystems:
+        #             if filesystem in test_snapshots[ssh_to]:
+        #                 if not filesystem in ret:
+        #                     ret[filesystem]=[]
+        #                 ret[filesystem].extend(test_snapshots[ssh_to][filesystem])

    return(ret)

@ -295,13 +299,13 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,


    if not first_snapshot:
-        txt="Initial transfer of "+source_filesystem+" snapshot "+second_snapshot
+        txt=">>> Transfer: "+source_filesystem+"@"+second_snapshot
    else:
-        txt="Incremental transfer of "+source_filesystem+" between snapshots "+first_snapshot+"..."+second_snapshot
+        txt=">>> Transfer: "+source_filesystem+"@"+first_snapshot+"...@"+second_snapshot

    if resume_token:
        source_cmd.extend([ "-t", resume_token ])
-        verbose("RESUMING "+txt)
+        txt=txt+" [RESUMED]"

    else:
        source_cmd.append("-p")
@ -314,7 +318,7 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
        else:
            source_cmd.append(source_filesystem + "@" + second_snapshot)

-        verbose(txt)
+    verbose(txt)

    if args.buffer and args.ssh_source!="local":
        source_cmd.append("|mbuffer -m {}".format(args.buffer))
@ -333,8 +337,7 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
        for filter_property in args.filter_properties:
            target_cmd.extend([ "-x" , filter_property ])

-    #also verbose in --verbose mode so we can see the transfer speed when its completed
-    if args.verbose or args.debug:
+    if args.debug:
        target_cmd.append("-v")

    if args.resume:
@ -381,32 +384,41 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,



-"""get filesystems that where already backupped to a target. """
-def zfs_get_backupped_filesystems(ssh_to, backup_name, target_fs):
-    #get all target filesystems that have received or inherited the backup propert, under the target_fs tree
-    ret=run(ssh_to=ssh_to, tab_split=False, cmd=[
-        "zfs", "get", "-r", "-t",  "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_fs
+#NOTE: unreliable when using with autobackup:bla=child
+# """get filesystems that where already backupped to a target. """
+# def zfs_get_backupped_filesystems(ssh_to, backup_name, target_path):
+#     #get all target filesystems that have received or inherited the backup propert, under the target_path tree
+#     ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
+#         "zfs", "get", "-r", "-t",  "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_path
+#     ])
+#
+#     return(ret)
+
+"""get existing filesystems """
+def zfs_get_existing_filesystems(ssh_to, target_path):
+    #get all target filesystems that have received or inherited the backup propert, under the target_path tree
+    ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
+        "zfs", "list", "-r", "-t",  "volume,filesystem", "-o", "name", "-H", target_path
    ])

    return(ret)


-
 """get filesystems that where once backupped to target but are no longer selected on source

 these are filesystems that are not in the list in target_filesystems.

 this happens when filesystems are destroyed or unselected on the source.
 """
-def get_stale_backupped_filesystems(ssh_to, backup_name, target_fs, target_filesystems):
+def get_stale_backupped_filesystems(backup_name, target_path, target_filesystems, existing_target_filesystems):
+

-    backupped_filesystems=zfs_get_backupped_filesystems(ssh_to=ssh_to, backup_name=backup_name, target_fs=target_fs)

    #determine backupped filesystems that are not in target_filesystems anymore
    stale_backupped_filesystems=[]
-    for backupped_filesystem in backupped_filesystems:
-        if backupped_filesystem not in target_filesystems:
-            stale_backupped_filesystems.append(backupped_filesystem)
+    for existing_target_filesystem in existing_target_filesystems:
+        if existing_target_filesystem not in target_filesystems:
+            stale_backupped_filesystems.append(existing_target_filesystem)

    return(stale_backupped_filesystems)

@ -434,30 +446,50 @@ def lstrip_path(path, count):
    return("/".join(path.split("/")[count:]))


-"""get list of filesystems that are changed, compared to the latest snapshot"""
-def zfs_get_unchanged_filesystems(ssh_to, snapshots):
+"""get list of filesystems that are changed, compared to specified latest snapshot. """
+def zfs_get_unchanged_snapshots(ssh_to, snapshots):

    ret=[]
    for ( filesystem, snapshot_list ) in snapshots.items():
        latest_snapshot=snapshot_list[-1]

-        cmd=[
-            "zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem
-        ]
+        cmd=[ "zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem ]
+

        output=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])

-        if output[0]=="0B":
+        if output[0]=="0B" or output[0]=="0":
+            ret.append(filesystem)
+
+    return(ret)
+
+"""get filesytems that are have changed since any snapshot."""
+def zfs_get_unchanged_filesystems(ssh_to, filesystems):
+
+    ret=[]
+    cmd=[ "zfs", "get","-H" ,"-oname,value", "written" ]
+    cmd.extend(filesystems)
+    output=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes=[ 0 ])
+
+    for ( filesystem , written ) in output:
+        if written=="0B" or written=="0":
            ret.append(filesystem)
-            verbose("No changes on {}".format(filesystem))

    return(ret)


+
+#fugly..
+failures=0
+#something failed, but we try to continue with the rest
+def failed(txt):
+    global failures
+    failures=failures+1
+    error("FAILURE: "+txt+"\n")
+
+
 def zfs_autobackup():

-
-
    ############## data gathering section

    if args.test:
@ -467,43 +499,73 @@ def zfs_autobackup():

    ### getting and determinging source/target filesystems

-    # get selected filesystem on backup source
+    # get selected filesystems on backup source
    verbose("Getting selected source filesystems for backup {0} on {1}".format(args.backup_name,args.ssh_source))
    source_filesystems=zfs_get_selected_filesystems(args.ssh_source, args.backup_name)

    #nothing todo
    if not source_filesystems:
-        error("No filesystems source selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
-        sys.exit(1)
+        abort("No source filesystems selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))

+    if args.ignore_replicated:
+        replicated_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_filesystems)
+        for replicated_filesystem in replicated_filesystems:
+            if replicated_filesystem in source_filesystems:
+                source_filesystems.remove(replicated_filesystem)
+                verbose("* Already replicated: {}".format(replicated_filesystem))
+
+    if not source_filesystems:
+        verbose("Nothing to do, all filesystems are already replicated.")
+        sys.exit(0)

    # determine target filesystems
    target_filesystems=[]
    for source_filesystem in source_filesystems:
-        #append args.target_fs prefix and strip args.strip_path paths from source_filesystem
-        target_filesystems.append(args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path))
+        #append args.target_path prefix and strip args.strip_path paths from source_filesystem
+        target_filesystems.append(args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path))
+    debug("Wanted target filesystems:\n"+str(pprint.pformat(target_filesystems)))
+
+    # get actual existing target filesystems. (including ones that might not be in the backupset anymore)
+    verbose("Getting existing target filesystems")
+    existing_target_filesystems=zfs_get_existing_filesystems(ssh_to=args.ssh_target, target_path=args.target_path)
+    debug("Existing target filesystems:\n"+str(pprint.pformat(existing_target_filesystems)))
+    common_target_filesystems=list(set(target_filesystems) & set(existing_target_filesystems))
+    debug("Common target filesystems (target filesystems that also exist on source):\n"+str(pprint.pformat(common_target_filesystems)))


-    ### get resumable transfers
+    ### get resumable transfers from target
    resumable_target_filesystems={}
    if args.resume:
        verbose("Checking for aborted transfers that can be resumed")
+        #Important: use target_filesystem, not existing_target_filesystems (during initial transfer its resumable but doesnt exsit yet)
        resumable_target_filesystems=zfs_get_resumable_filesystems(args.ssh_target, target_filesystems)
-        debug("Resumable filesystems: "+str(pprint.pformat(resumable_target_filesystems)))
+        debug("Resumable filesystems:\n"+str(pprint.pformat(resumable_target_filesystems)))


-    ### get all snapshots of all selected filesystems
+    ### get existing target snapshots
+    target_snapshots={}
+    if common_target_filesystems:
+        verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
+        target_snapshots=zfs_get_snapshots(args.ssh_target, common_target_filesystems, args.backup_name)
+        # except subprocess.CalledProcessError:
+        #     verbose("(ignoring errors, probably initial backup for this filesystem)")
+        #     pass
+        debug("Target snapshots:\n" + str(pprint.pformat(target_snapshots)))
+
+
+    ### get eixsting source snapshots
    verbose("Getting source snapshot-list from {0}".format(args.ssh_source))
    source_snapshots=zfs_get_snapshots(args.ssh_source, source_filesystems, args.backup_name)
-    debug("Source snapshots: " + str(pprint.pformat(source_snapshots)))
+    debug("Source snapshots:\n" + str(pprint.pformat(source_snapshots)))


-    #create new snapshot?
+    ### create new snapshots on source
    if not args.no_snapshot:
        #determine which filesystems changed since last snapshot
-        if not args.allow_empty:
-            verbose("Determining unchanged filesystems")
-            unchanged_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_snapshots)
+        if not args.allow_empty and not args.ignore_replicated:
+            #determine which filesystemn are unchanged since OUR snapshots. (not since ANY snapshot)
+            unchanged_filesystems=zfs_get_unchanged_snapshots(args.ssh_source, source_snapshots)
+
        else:
            unchanged_filesystems=[]

@ -511,31 +573,22 @@ def zfs_autobackup():
        for source_filesystem in source_filesystems:
            if source_filesystem not in unchanged_filesystems:
                snapshot_filesystems.append(source_filesystem)
+            else:
+                verbose("* Not snapshotting {}, no changes found.".format(source_filesystem))

-
-        #create snapshot
+        #create snapshots
        if snapshot_filesystems:
            new_snapshot_name=args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
-            verbose("Creating source snapshot {0} on {1} ".format(new_snapshot_name, args.ssh_source))
+            verbose("Creating source snapshots {0} on {1} ".format(new_snapshot_name, args.ssh_source))
            zfs_create_snapshot(args.ssh_source, snapshot_filesystems, new_snapshot_name)
        else:
            verbose("No changes at all, not creating snapshot.")

-
        #add it to the list of source filesystems
        for snapshot_filesystem in snapshot_filesystems:
            source_snapshots.setdefault(snapshot_filesystem,[]).append(new_snapshot_name)


-    #### get target snapshots
-    target_snapshots={}
-    try:
-        verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
-        target_snapshots=zfs_get_snapshots(args.ssh_target, target_filesystems, args.backup_name)
-    except subprocess.CalledProcessError:
-        verbose("(ignoring errors, probably initial backup for this filesystem)")
-        pass
-    debug("Target snapshots: " + str(pprint.pformat(target_snapshots)))


    #obsolete snapshots that may be removed
@ -548,179 +601,201 @@ def zfs_autobackup():

    #determine which snapshots to send for each filesystem
    for source_filesystem in source_filesystems:
-        target_filesystem=args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path)
+        try:
+            target_filesystem=args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path)

-        if source_filesystem not in source_snapshots:
-            #this happens if you use --no-snapshot and there are new filesystems without snapshots
-            verbose("Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
-        else:
-
-            #incremental or initial send?
-            if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
-                #incremental mode, determine what to send and what is obsolete
-
-                #latest succesfully send snapshot, should be common on both source and target
-                latest_target_snapshot=target_snapshots[target_filesystem][-1]
-
-                if latest_target_snapshot not in source_snapshots[source_filesystem]:
-                    #cant find latest target anymore. find first common snapshot and inform user
-                    error_msg="Cant find latest target snapshot on source, did you destroy/rename it?"
-                    error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
-                    error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
-                    found=False
-                    for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
-                        if latest_target_snapshot in source_snapshots[source_filesystem]:
-                            error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
-                            found=True
-                            break
-                    if not found:
-                        error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
-
-                    raise(Exception(error_msg))
-
-                #send all new source snapshots that come AFTER the last target snapshot
-                latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
-                send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
-
-                #source snapshots that come BEFORE last target snapshot are obsolete
-                source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
-
-                #target snapshots that come BEFORE last target snapshot are obsolete
-                latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
-                target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
+            if source_filesystem not in source_snapshots:
+                #this happens if you use --no-snapshot and there are new filesystems without snapshots
+                verbose("* Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
            else:
-                #initial mode, send all snapshots, nothing is obsolete:
-                latest_target_snapshot=None
-                send_snapshots=source_snapshots[source_filesystem]
-                target_obsolete_snapshots[target_filesystem]=[]
-                source_obsolete_snapshots[source_filesystem]=[]

-            #now actually send the snapshots
-            if not args.no_send:
+                #incremental or initial send?
+                if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
+                    #incremental mode, determine what to send and what is obsolete

-                if send_snapshots and args.rollback and latest_target_snapshot:
-                    #roll back any changes on target
-                    debug("Rolling back target to latest snapshot.")
-                    run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+"@"+latest_target_snapshot ])
+                    #latest succesfully send snapshot, should be common on both source and target
+                    latest_target_snapshot=target_snapshots[target_filesystem][-1]
+
+                    if latest_target_snapshot not in source_snapshots[source_filesystem]:
+                        #cant find latest target anymore. find first common snapshot and inform user
+                        error_msg="Cant find latest target snapshot on source for '{}', did you destroy/rename it?".format(source_filesystem)
+                        error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
+                        error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
+                        found=False
+                        for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
+                            if latest_target_snapshot in source_snapshots[source_filesystem]:
+                                error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
+                                found=True
+                                break
+                        if not found:
+                            error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
+                        else:
+                            if args.ignore_new:
+                                verbose("* Skipping source filesystem '{0}', target already has newer snapshots.".format(source_filesystem))
+                                continue
+
+                        raise(Exception(error_msg))
+
+                    #send all new source snapshots that come AFTER the last target snapshot
+                    latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
+                    send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
+
+                    #source snapshots that come BEFORE last target snapshot are obsolete
+                    source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
+
+                    #target snapshots that come BEFORE last target snapshot are obsolete
+                    latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
+                    target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
+                else:
+                    #initial mode, send all snapshots, nothing is obsolete:
+                    latest_target_snapshot=None
+                    send_snapshots=source_snapshots[source_filesystem]
+                    target_obsolete_snapshots[target_filesystem]=[]
+                    source_obsolete_snapshots[source_filesystem]=[]
+
+                #now actually send the snapshots
+                if not args.no_send:
+
+                    if send_snapshots and args.rollback and latest_target_snapshot:
+                        #roll back any changes on target
+                        debug("Rolling back target to latest snapshot.")
+                        run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+"@"+latest_target_snapshot ])


-                for send_snapshot in send_snapshots:
+                    for send_snapshot in send_snapshots:

-                    #resumable?
-                    if target_filesystem in resumable_target_filesystems:
-                        resume_token=resumable_target_filesystems.pop(target_filesystem)
-                    else:
-                        resume_token=None
+                        #resumable?
+                        if target_filesystem in resumable_target_filesystems:
+                            resume_token=resumable_target_filesystems.pop(target_filesystem)
+                        else:
+                            resume_token=None

-                    #hold the snapshot we're sending on the source
-                    zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)
+                        #hold the snapshot we're sending on the source
+                        if not args.no_holds:
+                            zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)

-                    zfs_transfer(
-                        ssh_source=args.ssh_source, source_filesystem=source_filesystem,
-                        first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
-                        ssh_target=args.ssh_target, target_filesystem=target_filesystem,
-                        resume_token=resume_token
-                    )
+                        zfs_transfer(
+                            ssh_source=args.ssh_source, source_filesystem=source_filesystem,
+                            first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
+                            ssh_target=args.ssh_target, target_filesystem=target_filesystem,
+                            resume_token=resume_token
+                        )

-                    #hold the snapshot we just send to the target
-                    zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)
+                        #hold the snapshot we just send to the target
+                        zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)



-                    #now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
-                    if latest_target_snapshot:
-                        zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
-                        target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
+                        #now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
+                        if latest_target_snapshot:
+                            zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
+                            target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)

-                        zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
-                        source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
-                    #we just received a new filesytem?
-                    else:
-                        if args.clear_refreservation:
-                            debug("Clearing refreservation to save space.")
+                            if not args.no_holds:
+                                zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
+                                source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
+                        #we just received a new filesytem?
+                        else:
+                            if args.clear_refreservation:
+                                debug("Clearing refreservation to save space.")

-                            run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])
+                                run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])


-                        if args.clear_mountpoint:
-                            debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")
+                            if args.clear_mountpoint:
+                                debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")

-                            run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )
+                                run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )


-                    latest_target_snapshot=send_snapshot
-
+                        latest_target_snapshot=send_snapshot
+        # failed, skip this source_filesystem
+        except Exception as e:
+            failed(str(e))


    ############## cleanup section
    #we only do cleanups after everything is complete, to keep everything consistent (same snapshots everywhere)


-    #find stale backups on target that have become obsolete
-    verbose("Getting stale filesystems and snapshots from {0}".format(args.ssh_target))
-    stale_target_filesystems=get_stale_backupped_filesystems(ssh_to=args.ssh_target, backup_name=args.backup_name, target_fs=args.target_fs, target_filesystems=target_filesystems)
-    debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
+    if not args.ignore_replicated:
+        #find stale backups on target that have become obsolete

-    stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
-    debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
-    target_obsolete_snapshots.update(stale_target_snapshots)
+        stale_target_filesystems=get_stale_backupped_filesystems(backup_name=args.backup_name, target_path=args.target_path, target_filesystems=target_filesystems, existing_target_filesystems=existing_target_filesystems)
+        debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))

-    #determine stale filesystems that have no snapshots left (the can be destroyed)
-    #TODO: prevent destroying filesystems that have underlying filesystems that are still active.
-    stale_target_destroys=[]
-    for stale_target_filesystem in stale_target_filesystems:
-        if stale_target_filesystem not in stale_target_snapshots:
-            stale_target_destroys.append(stale_target_filesystem)
+        stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
+        debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
+        target_obsolete_snapshots.update(stale_target_snapshots)
+
+        #determine stale filesystems that have no snapshots left (the can be destroyed)
+        stale_target_destroys=[]
+        for stale_target_filesystem in stale_target_filesystems:
+            if stale_target_filesystem not in stale_target_snapshots:
+                stale_target_destroys.append(stale_target_filesystem)
+
+        if stale_target_destroys:
+            #NOTE: dont destroy automaticly..not safe enough.
+            # if args.destroy_stale:
+            #     verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
+            #     zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
+            # else:
+            verbose("Stale filesystems on {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
+    else:
+        verbose("NOTE: Cant determine stale target filesystems while using ignore_replicated.")

-    if stale_target_destroys:
-        if args.destroy_stale:
-            verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
-            zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
-        else:
-            verbose("Stale filesystems on {0}, use --destroy-stale to destroy:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))


    #now actually destroy the old snapshots
    source_destroys=determine_destroy_list(source_obsolete_snapshots, args.keep_source)
    if source_destroys:
        verbose("Destroying old snapshots on source {0}:\n{1}".format(args.ssh_source, "\n".join(source_destroys)))
-        zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
+        try:
+            zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
+        except Exception as e:
+            failed(str(e))
+

    target_destroys=determine_destroy_list(target_obsolete_snapshots, args.keep_target)
    if target_destroys:
        verbose("Destroying old snapshots on target {0}:\n{1}".format(args.ssh_target, "\n".join(target_destroys)))
-        zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
-
-
-    verbose("All done")
-
+        try:
+            zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
+        except Exception as e:
+            failed(str(e))


 ################################################################## ENTRY POINT

 # parse arguments
 import argparse
-parser = argparse.ArgumentParser(description='ZFS autobackup v2.2')
+parser = argparse.ArgumentParser(
+    description='ZFS autobackup v2.4',
+    epilog='When a filesystem fails, zfs_backup will continue and report the number of failures at that end. Also the exit code will indicate the number of failures.')
 parser.add_argument('--ssh-source', default="local", help='Source host to get backup from. (user@hostname) Default %(default)s.')
 parser.add_argument('--ssh-target', default="local", help='Target host to push backup to. (user@hostname) Default  %(default)s.')
 parser.add_argument('--keep-source', type=int, default=30, help='Number of days to keep old snapshots on source. Default %(default)s.')
 parser.add_argument('--keep-target', type=int, default=30, help='Number of days to keep old snapshots on target. Default %(default)s.')
 parser.add_argument('backup_name',    help='Name of the backup (you should set the zfs property "autobackup:backup-name" to true on filesystems you want to backup')
-parser.add_argument('target_fs',    help='Target filesystem')
+parser.add_argument('target_path',    help='Target path')

 parser.add_argument('--no-snapshot', action='store_true', help='dont create new snapshot (usefull for finishing uncompleted backups, or cleanups)')
 parser.add_argument('--no-send', action='store_true', help='dont send snapshots (usefull to only do a cleanup)')
 parser.add_argument('--allow-empty', action='store_true', help='if nothing has changed, still create empty snapshots.')
+parser.add_argument('--ignore-replicated', action='store_true',  help='Ignore datasets that seem to be replicated some other way. (No changes since lastest snapshot. Usefull for proxmox HA replication)')
+parser.add_argument('--no-holds', action='store_true',  help='Dont lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
+parser.add_argument('--ignore-new', action='store_true',  help='Ignore filesystem if there are already newer snapshots for it on the target (use with caution)')
+
 parser.add_argument('--resume', action='store_true', help='support resuming of interrupted transfers by using the zfs extensible_dataset feature (both zpools should have it enabled) Disadvantage is that you need to use zfs recv -A if another snapshot is created on the target during a receive. Otherwise it will keep failing.')
 parser.add_argument('--strip-path', default=0, type=int, help='number of directory to strip from path (use 1 when cloning zones between 2 SmartOS machines)')
-parser.add_argument('--buffer', default="",  help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G)')
+parser.add_argument('--buffer', default="",  help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G) Will also show nice progress output.')


-parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
+# parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
 parser.add_argument('--clear-refreservation', action='store_true', help='Set refreservation property to none for new filesystems. Usefull when backupping SmartOS volumes. (recommended)')
 parser.add_argument('--clear-mountpoint', action='store_true', help='Sets canmount=noauto property, to prevent the received filesystem from mounting over existing filesystems. (recommended)')
 parser.add_argument('--filter-properties', action='append', help='Filter properties when receiving filesystems. Can be specified multiple times. (Example: If you send data from Linux to FreeNAS, you should filter xattr)')
-parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_fs to on)')
+parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_path to on)')
 parser.add_argument('--ignore-transfer-errors', action='store_true', help='Ignore transfer errors (still checks if received filesystem exists. usefull for acltype errors)')


@ -731,11 +806,23 @@ parser.add_argument('--debug', action='store_true', help='debug output (shows co
 #note args is the only global variable we use, since its a global readonly setting anyway
 args = parser.parse_args()

+if args.ignore_replicated and args.allow_empty:
+    abort("Cannot use allow_empty with ignore_replicated.")
+
+
 try:
    zfs_autobackup()
+    if not failures:
+        verbose("All operations completed succesfully.")
+        sys.exit(0)
+    else:
+        verbose("{} OPERATION(S) FAILED!".format(failures))
+        #exit with the number of failures.
+        sys.exit(min(255,failures))
+
 except Exception as e:
    if args.debug:
        raise
    else:
-        print("* ABORTED *")
        print(str(e))
+        abort("FATAL ERROR")
Author	SHA1	Message	Date
DatuX	17445ec54a	Update README.md	2019-11-10 01:16:48 +01:00
DatuX	07a150618a	Update README.md	2019-11-10 01:01:08 +01:00
DatuX	067f3b92d1	Update README.md	2019-11-10 00:51:20 +01:00
Edwin Eefting	71a394cfc7	rollback	2019-10-19 14:54:10 +02:00
Edwin Eefting	bfc36ac87f	rollback	2019-10-19 14:53:31 +02:00
Edwin Eefting	ad47b26f56	Revert "fixing quota issues" This reverts commit `d973905303`.	2019-10-17 10:42:54 +02:00
Edwin Eefting	f38da17592	v3.0: no longer replicate all properties by default. this made things unnecessary complicated. now use the --properties option to specify the properties you want.	2019-10-16 13:37:31 +02:00
Edwin Eefting	d973905303	fixing quota issues	2019-10-16 12:51:12 +02:00
Edwin Eefting	82465acd5b	clearification of testmode	2019-10-16 10:27:35 +02:00
Edwin Eefting	514131d67c	update docs	2019-10-16 09:45:08 +02:00
Edwin Eefting	dfcae1613b	clearify the target path is a zfs filesystem, not an regular path	2019-10-16 09:43:28 +02:00
DatuX	67b21b4015	bugfix: exitcode always was 255	2019-10-16 09:28:21 +02:00
DatuX	3907c850a6	Update zfs_autobackup	2019-10-02 22:58:50 +02:00
Edwin Eefting	3b9b96243b	dont destroy stale snapshots if we're using --ignore-replicated	2019-10-02 19:39:31 +02:00
Edwin Eefting	54235f455a	zfs_autobackup 2.4: try to continue on non-fatal errors	2019-10-02 18:21:24 +02:00
DatuX	c176b968a9	forgot to return exit code when not using debug mode :(	2019-03-26 23:06:03 +01:00
Edwin Eefting	921f7df0a5	updated readme	2019-02-19 11:24:10 +01:00
Edwin Eefting	edee598cf8	updated readme	2019-02-19 11:19:16 +01:00
Edwin Eefting	80b3272f0f	disable destroy-stale for now. updated readme	2019-02-19 11:09:54 +01:00
Edwin Eefting	617e0fb69b	fix/revisit stale filesystem detection	2019-02-19 11:04:59 +01:00
Edwin Eefting	46a85fd170	updated readme	2019-02-19 10:26:52 +01:00
Edwin Eefting	1f59229419	fixes	2019-02-19 01:28:22 +01:00
Edwin Eefting	fcd98e2d87	much cleaner output and layout. removed useless error output. general cleanup.	2019-02-19 00:17:20 +01:00
Edwin Eefting	dd8b2442ec	options for proxmox HA: no-holds, ignore-new and ignore-replicated	2019-02-18 18:53:54 +01:00