Compare commits

..

2 Commits

2 changed files with 286 additions and 373 deletions

106
README.md
View File

@ -1,18 +1,12 @@
# ZFS autobackup
(checkout v3.0-beta for the new cool stuff: https://github.com/psy0rz/zfs_autobackup/blob/v3/README.md)
Official releases: https://github.com/psy0rz/zfs_autobackup/releases
Introduction
============
ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. This is done using the very effcient zfs send and receive commands.
It has the following features:
* Works across operating systems: Tested with Linux, FreeBSD/FreeNAS and SmartOS.
* Works in combination with existing replication systems. (Like Proxmox HA)
* Automatically selects filesystems to backup by looking at a simple ZFS property. (recursive)
* Automaticly selects filesystems to backup by looking at a simple ZFS property. (recursive)
* Creates consistent snapshots. (takes all snapshots at once, atomic.)
* Multiple backups modes:
* "push" local data to a backup-server via SSH.
@ -22,37 +16,34 @@ It has the following features:
* Supports resuming of interrupted transfers. (via the zfs extensible_dataset feature)
* Backups and snapshots can be named to prevent conflicts. (multiple backups from and to the same filesystems are no problem)
* Always creates a new snapshot before starting.
* Checks everything but tries continue on non-fatal errors when possible. (Reports error-count when done)
* Checks everything and aborts on errors.
* Ability to 'finish' aborted backups to see what goes wrong.
* Easy to debug and has a test-mode. Actual unix commands are printed.
* Keeps latest X snapshots remote and locally. (default 30, configurable)
* Uses zfs-holds on important snapshots so they cant be accidentally destroyed.
* Easy installation:
* Only one host needs the zfs_autobackup script. The other host just needs ssh and the zfs command.
* Written in python and uses zfs-commands, no 3rd party dependency's or libraries.
* No separate config files or properties. Just one command you can copy/paste in your backup script.
* Written in python and uses zfs-commands, no 3rd party dependencys or libraries.
* No seperate config files or properties. Just one command you can copy/paste in your backup script.
Usage
====
```
usage: zfs_autobackup [-h] [--ssh-source SSH_SOURCE] [--ssh-target SSH_TARGET]
[--keep-source KEEP_SOURCE] [--keep-target KEEP_TARGET]
[--no-snapshot] [--no-send] [--allow-empty]
[--ignore-replicated] [--no-holds] [--ignore-new]
[--resume] [--strip-path STRIP_PATH] [--buffer BUFFER]
[--no-snapshot] [--no-send] [--resume]
[--strip-path STRIP_PATH] [--destroy-stale]
[--clear-refreservation] [--clear-mountpoint]
[--filter-properties FILTER_PROPERTIES] [--rollback]
[--ignore-transfer-errors] [--test] [--verbose]
[--debug]
backup_name target_path
[--test] [--verbose] [--debug]
backup_name target_fs
ZFS autobackup v2.4
ZFS autobackup v2.2
positional arguments:
backup_name Name of the backup (you should set the zfs property
"autobackup:backup-name" to true on filesystems you
want to backup
target_path Target path
target_fs Target filesystem
optional arguments:
-h, --help show this help message and exit
@ -71,14 +62,6 @@ optional arguments:
--no-snapshot dont create new snapshot (usefull for finishing
uncompleted backups, or cleanups)
--no-send dont send snapshots (usefull to only do a cleanup)
--allow-empty if nothing has changed, still create empty snapshots.
--ignore-replicated Ignore datasets that seem to be replicated some other
way. (No changes since lastest snapshot. Usefull for
proxmox HA replication)
--no-holds Dont lock snapshots on the source. (Usefull to allow
proxmox HA replication to switches nodes)
--ignore-new Ignore filesystem if there are already newer snapshots
for it on the target (use with caution)
--resume support resuming of interrupted transfers by using the
zfs extensible_dataset feature (both zpools should
have it enabled) Disadvantage is that you need to use
@ -88,9 +71,8 @@ optional arguments:
--strip-path STRIP_PATH
number of directory to strip from path (use 1 when
cloning zones between 2 SmartOS machines)
--buffer BUFFER Use mbuffer with specified size to speedup zfs
transfer. (e.g. --buffer 1G) Will also show nice
progress output.
--destroy-stale Destroy stale backups that have no more snapshots. Be
sure to verify the output before using this!
--clear-refreservation
Set refreservation property to none for new
filesystems. Usefull when backupping SmartOS volumes.
@ -104,17 +86,11 @@ optional arguments:
from Linux to FreeNAS, you should filter xattr)
--rollback Rollback changes on the target before starting a
backup. (normally you can prevent changes by setting
the readonly property on the target_path to on)
--ignore-transfer-errors
Ignore transfer errors (still checks if received
filesystem exists. usefull for acltype errors)
the readonly property on the target_fs to on)
--test dont change anything, just show what would be done
(still does all read-only operations)
--verbose verbose output
--debug debug output (shows commands that are executed)
When a filesystem fails, zfs_backup will continue and report the number of
failures at that end. Also the exit code will indicate the number of failures.
```
Backup example
@ -224,7 +200,7 @@ Host smartos04
Compression yes
```
This way you can just specify "smartos04" as host.
This way you can just specify smartos04
Also uses compression on slow links.
@ -260,6 +236,8 @@ root@fs1:/home/psy# zfs send fs1/zones/backup/zfsbackups/smartos01.server.com/z
After that you can rename the disk image from the temporary location to the location of a new SmartOS machine you've created.
Monitoring with Zabbix-jobs
===========================
@ -271,55 +249,3 @@ zabbix-job-status backup_smartos01_fs1 daily $?
```
This will update the zabbix server with the exitcode and will also alert you if the job didnt run for more than 2 days.
Backuping up a proxmox cluster with HA replication
==================================================
Due to the nature of proxmox we had to make a few enhancements to zfs_autobackup. This will probably also benefit other systems that use their own replication in combination with zfs_autobackup.
All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs_autobackup will be fine. (and it will get the next backup from the new node automaticly)
In the example below we have 3 nodes, named h4, h5 and h6.
The backup will go to a machine named smartos03.
Preparing the proxmox nodes
---------------------------
On each node select the filesystems as following:
```
root@h4:~# zfs set autobackup:h4_smartos03=true rpool
root@h4:~# zfs set autobackup:h4_smartos03=false rpool/data
root@h4:~# zfs set autobackup:data_smartos03=child rpool/data
```
* rpool will be backuped the usual way, and is named h4_smartos03. (each node will have a unique name)
* rpool/data will be excluded from the usual backup
* The CHILDREN of rpool/data be selected for a cluster wide backup named data_smartos03. (each node uses the same backup name)
Preparing the backup server
---------------------------
Extra options needed for proxmox with HA:
* --no-holds: To allow proxmox to destroy our snapshots if a VM migrates to another node.
* --ignore-replicated: To ignore the replicated filesystems of proxmox on the receiving proxmox nodes. (e.g: only backup from the node where the VM is active)
I use the following backup script on the backup server:
```
for H in h4 h5 h6; do
echo "################################### DATA $H"
#backup data filesystems to a common place
./zfs_autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --no-holds $@
zabbix-job-status backup_$H""_data_smartos03 daily $? >/dev/null 2>/dev/null
echo "################################### RPOOL $H"
#backup rpool to own place
./zfs_autobackup --ssh-source root@$H $H""_smartos03 zones/backup/zfsbackups/$H --verbose --clear-refreservation --clear-mountpoint --resume --ignore-transfer-errors $@
zabbix-job-status backup_$H""_smartos03 daily $? >/dev/null 2>/dev/null
done
```

View File

@ -1,9 +1,5 @@
#!/usr/bin/env python2
# -*- coding: utf8 -*-
#(C)edwin@datux.nl -- Edwin Eefting
#Release under GPL.
from __future__ import print_function
import os
import sys
@ -12,25 +8,23 @@ import traceback
import subprocess
import pprint
import time
import shlex
def error(txt):
print(txt, file=sys.stderr)
def verbose(txt):
if args.verbose:
print(txt)
def debug(txt):
if args.debug:
print(txt)
#fatal abort execution, exit code 255
def abort(txt):
error(txt)
sys.exit(255)
"""run a command. specifiy ssh user@host to run remotely"""
def run(cmd, input=None, ssh_to="local", tab_split=False, valid_exitcodes=[ 0 ], test=False):
@ -101,7 +95,7 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
for source_filesystem in source_filesystems:
(name,value,source)=source_filesystem
if value=="false":
verbose("* Ignored : {0} (disabled)".format(name))
verbose("Ignored : {0} (disabled)".format(name))
else:
if source=="local" and ( value=="true" or value=="child"):
@ -109,16 +103,16 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
if source=="local" and value=="true":
selected_filesystems.append(name)
verbose("* Selected: {0} (direct selection)".format(name))
verbose("Selected: {0} (direct selection)".format(name))
elif source.find("inherited from ")==0 and (value=="true" or value=="child"):
inherited_from=re.sub("^inherited from ", "", source)
if inherited_from in direct_filesystems:
selected_filesystems.append(name)
verbose("* Selected: {0} (inherited selection)".format(name))
verbose("Selected: {0} (inherited selection)".format(name))
else:
verbose("* Ignored : {0} (already a backup)".format(name))
verbose("Ignored : {0} (already a backup)".format(name))
else:
verbose("* Ignored : {0} (only childs)".format(name))
verbose("Ignored : {0} (only childs)".format(name))
return(selected_filesystems)
@ -129,6 +123,7 @@ def zfs_get_resumable_filesystems(ssh_to, filesystems):
cmd=[ "zfs", "get", "-t", "volume,filesystem", "-o", "name,value", "-H", "receive_resume_token" ]
cmd.extend(filesystems)
#TODO: get rid of ugly errors for non-existing target filesystems
resumable_filesystems=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes= [ 0,1 ] )
ret={}
@ -148,6 +143,10 @@ def zfs_destroy_snapshots(ssh_to, snapshots):
[ "xargs", "-0", "-n", "1", "zfs", "destroy", "-d" ]
)
def zfs_destroy_bookmark(ssh_to, bookmark):
run(ssh_to=ssh_to, test=args.test, valid_exitcodes=[ 0,1 ], cmd=[ "zfs", "destroy", bookmark ])
"""destroy list of filesystems """
def zfs_destroy(ssh_to, filesystems, recursive=False):
@ -179,9 +178,9 @@ def zfs_create_snapshot(ssh_to, filesystems, snapshot):
for pool in pools:
cmd=[ "zfs", "snapshot" ]
for filesystem in pools[pool]:
cmd.append(filesystem+"@"+snapshot)
cmd.append(filesystem+snapshot)
#in testmode we dont actually make changes, so keep them in a list to simulate
# #in testmode we dont actually make changes, so keep them in a list to simulate
# if args.test:
# if not ssh_to in test_snapshots:
# test_snapshots[ssh_to]={}
@ -196,26 +195,37 @@ def zfs_create_snapshot(ssh_to, filesystems, snapshot):
return[filesystem_name]=[ "snashot1", "snapshot2", ... ]
"""
def zfs_get_snapshots(ssh_to, filesystems, backup_name):
def zfs_get_snapshots(ssh_to, filesystems, backup_name, also_bookmarks=False):
ret={}
if filesystems:
if also_bookmarks:
fstype="snapshot,bookmark"
else:
fstype="snapshot"
#TODO: get rid of ugly errors for non-existing target filesystems
cmd=[
"zfs", "list", "-d", "1", "-r", "-t" ,"snapshot", "-H", "-o", "name"
"zfs", "list", "-d", "1", "-r", "-t" ,fstype, "-H", "-o", "name", "-s", "createtxg"
]
cmd.extend(filesystems)
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0,1 ])
for snapshot in snapshots:
(filesystem, snapshot_name)=snapshot.split("@")
if re.match("^"+backup_name+"-[0-9]*$", snapshot_name):
if not filesystem in ret:
ret[filesystem]=[]
ret[filesystem].append(snapshot_name)
if "@" in snapshot:
(filesystem, snapshot_name)=snapshot.split("@")
snapshot_name="@"+snapshot_name
else:
(filesystem, snapshot_name)=snapshot.split("#")
snapshot_name="#"+snapshot_name
if re.match("^[@#]"+backup_name+"-[0-9]*$", snapshot_name):
ret.setdefault(filesystem,[]).append(snapshot_name)
#TODO: get rid of this or make a more generic caching/testing system. (is it still needed, since the allow_empty-function?)
#also add any test-snapshots that where created with --test mode
# if args.test:
# if ssh_to in test_snapshots:
@ -228,6 +238,31 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
return(ret)
# """get names of all bookmarks for specified filesystems belonging to backup_name
#
# return[filesystem_name]=[ "bookmark1", "bookmark2", ... ]
# """
# def zfs_get_bookmarks(ssh_to, filesystems, backup_name):
#
# ret={}
#
# if filesystems:
# #TODO: get rid of ugly errors for non-existing target filesystems
# cmd=[
# "zfs", "list", "-d", "1", "-r", "-t" ,"bookmark", "-H", "-o", "name", "-s", "createtxg"
# ]
# cmd.extend(filesystems)
#
# bookmarks=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
#
# for bookmark in bookmarks:
# (filesystem, bookmark_name)=bookmark.split("#")
# if re.match("^"+backup_name+"-[0-9]*$", bookmark_name):
# ret.setdefault(filesystem,[]).append(bookmark_name)
#
# return(ret)
def default_tag():
return("zfs_autobackup:"+args.backup_name)
@ -249,6 +284,15 @@ def zfs_release_snapshot(ssh_to, snapshot, tag=None):
run(ssh_to=ssh_to, test=args.test, tab_split=False, cmd=cmd, valid_exitcodes=[ 0, 1 ])
"""bookmark a snapshot"""
def zfs_bookmark_snapshot(ssh_to, snapshot):
(filesystem, snapshot_name)=snapshot.split("@")
cmd=[
"zfs", "bookmark", snapshot, '#'+snapshot_name
]
run(ssh_to=ssh_to, test=args.test, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
"""transfer a zfs snapshot from source to target. both can be either local or via ssh.
@ -299,26 +343,31 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
if not first_snapshot:
txt=">>> Transfer: "+source_filesystem+"@"+second_snapshot
txt="Initial transfer of "+source_filesystem+" snapshot "+second_snapshot
else:
txt=">>> Transfer: "+source_filesystem+"@"+first_snapshot+"...@"+second_snapshot
txt="Incremental transfer of "+source_filesystem+" between snapshots "+first_snapshot+"..."+second_snapshot
if resume_token:
source_cmd.extend([ "-t", resume_token ])
txt=txt+" [RESUMED]"
verbose("RESUMING "+txt)
else:
source_cmd.append("-p")
# source_cmd.append("-p")
if first_snapshot:
source_cmd.extend([ "-i", first_snapshot ])
source_cmd.append( "-i")
#TODO: fix these horrible escaping hacks
if ssh_source != "local":
source_cmd.append( "'"+first_snapshot+"'" )
else:
source_cmd.append( first_snapshot )
if ssh_source != "local":
source_cmd.append("'" + source_filesystem + "@" + second_snapshot + "'")
source_cmd.append("'" + source_filesystem + second_snapshot + "'")
else:
source_cmd.append(source_filesystem + "@" + second_snapshot)
source_cmd.append(source_filesystem + second_snapshot)
verbose(txt)
verbose(txt)
if args.buffer and args.ssh_source!="local":
source_cmd.append("|mbuffer -m {}".format(args.buffer))
@ -337,7 +386,8 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
for filter_property in args.filter_properties:
target_cmd.extend([ "-x" , filter_property ])
if args.debug:
#also verbose in --verbose moqde so we can see the transfer speed when its completed
if args.verbose or args.debug:
target_cmd.append("-v")
if args.resume:
@ -380,45 +430,36 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
raise(subprocess.CalledProcessError(target_proc.returncode, target_cmd))
debug("Verifying if snapshot exists on target")
run(ssh_to=ssh_target, cmd=["zfs", "list", target_filesystem+"@"+second_snapshot ])
run(ssh_to=ssh_target, cmd=["zfs", "list", target_filesystem+second_snapshot ])
#NOTE: unreliable when using with autobackup:bla=child
# """get filesystems that where already backupped to a target. """
# def zfs_get_backupped_filesystems(ssh_to, backup_name, target_path):
# #get all target filesystems that have received or inherited the backup propert, under the target_path tree
# ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
# "zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_path
# ])
#
# return(ret)
"""get existing filesystems """
def zfs_get_existing_filesystems(ssh_to, target_path):
#get all target filesystems that have received or inherited the backup propert, under the target_path tree
ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
"zfs", "list", "-r", "-t", "volume,filesystem", "-o", "name", "-H", target_path
"""get filesystems that where already backupped to a target. """
def zfs_get_backupped_filesystems(ssh_to, backup_name, target_fs):
#get all target filesystems that have received or inherited the backup propert, under the target_fs tree
ret=run(ssh_to=ssh_to, tab_split=False, cmd=[
"zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_fs
])
return(ret)
"""get filesystems that where once backupped to target but are no longer selected on source
these are filesystems that are not in the list in target_filesystems.
this happens when filesystems are destroyed or unselected on the source.
"""
def get_stale_backupped_filesystems(backup_name, target_path, target_filesystems, existing_target_filesystems):
def get_stale_backupped_filesystems(ssh_to, backup_name, target_fs, target_filesystems):
backupped_filesystems=zfs_get_backupped_filesystems(ssh_to=ssh_to, backup_name=backup_name, target_fs=target_fs)
#determine backupped filesystems that are not in target_filesystems anymore
stale_backupped_filesystems=[]
for existing_target_filesystem in existing_target_filesystems:
if existing_target_filesystem not in target_filesystems:
stale_backupped_filesystems.append(existing_target_filesystem)
for backupped_filesystem in backupped_filesystems:
if backupped_filesystem not in target_filesystems:
stale_backupped_filesystems.append(backupped_filesystem)
return(stale_backupped_filesystems)
@ -436,8 +477,8 @@ def determine_destroy_list(snapshots, days):
else:
time_secs=int(time_str)
# verbose("time_secs"+time_str)
if (now-time_secs) > (24 * 3600 * days):
ret.append(filesystem+"@"+snapshot)
if (now-time_secs) >= (24 * 3600 * days):
ret.append(filesystem+snapshot)
return(ret)
@ -446,50 +487,33 @@ def lstrip_path(path, count):
return("/".join(path.split("/")[count:]))
"""get list of filesystems that are changed, compared to specified latest snapshot. """
def zfs_get_unchanged_snapshots(ssh_to, snapshots):
ret=[]
for ( filesystem, snapshot_list ) in snapshots.items():
latest_snapshot=snapshot_list[-1]
cmd=[ "zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem ]
output=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
if output[0]=="0B" or output[0]=="0":
ret.append(filesystem)
return(ret)
"""get filesytems that are have changed since any snapshot."""
"""get list of filesystems that are changed, compared to the latest snapshot"""
def zfs_get_unchanged_filesystems(ssh_to, filesystems):
ret=[]
cmd=[ "zfs", "get","-H" ,"-oname,value", "written" ]
cmd.extend(filesystems)
output=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes=[ 0 ])
for ( filesystem, snapshot_list ) in filesystems.items():
latest_snapshot=snapshot_list[-1]
for ( filesystem , written ) in output:
if written=="0B" or written=="0":
#make sure its a snapshot and not a bookmark
latest_snapshot="@"+latest_snapshot[:1]
cmd=[
"zfs", "get","-H" ,"-ovalue", "written"+latest_snapshot, filesystem
]
output=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
if output[0]=="0B":
ret.append(filesystem)
verbose("No changes on {}".format(filesystem))
return(ret)
#fugly..
failures=0
#something failed, but we try to continue with the rest
def failed(txt):
global failures
failures=failures+1
error("FAILURE: "+txt+"\n")
def zfs_autobackup():
############## data gathering section
if args.test:
@ -499,73 +523,45 @@ def zfs_autobackup():
### getting and determinging source/target filesystems
# get selected filesystems on backup source
# get selected filesystem on backup source
verbose("Getting selected source filesystems for backup {0} on {1}".format(args.backup_name,args.ssh_source))
source_filesystems=zfs_get_selected_filesystems(args.ssh_source, args.backup_name)
#nothing todo
if not source_filesystems:
abort("No source filesystems selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
error("No filesystems source selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
sys.exit(1)
if args.ignore_replicated:
replicated_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_filesystems)
for replicated_filesystem in replicated_filesystems:
if replicated_filesystem in source_filesystems:
source_filesystems.remove(replicated_filesystem)
verbose("* Already replicated: {}".format(replicated_filesystem))
if not source_filesystems:
verbose("Nothing to do, all filesystems are already replicated.")
sys.exit(0)
# determine target filesystems
target_filesystems=[]
for source_filesystem in source_filesystems:
#append args.target_path prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path))
debug("Wanted target filesystems:\n"+str(pprint.pformat(target_filesystems)))
# get actual existing target filesystems. (including ones that might not be in the backupset anymore)
verbose("Getting existing target filesystems")
existing_target_filesystems=zfs_get_existing_filesystems(ssh_to=args.ssh_target, target_path=args.target_path)
debug("Existing target filesystems:\n"+str(pprint.pformat(existing_target_filesystems)))
common_target_filesystems=list(set(target_filesystems) & set(existing_target_filesystems))
debug("Common target filesystems (target filesystems that also exist on source):\n"+str(pprint.pformat(common_target_filesystems)))
#append args.target_fs prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path))
### get resumable transfers from target
### get resumable transfers
resumable_target_filesystems={}
if args.resume:
verbose("Checking for aborted transfers that can be resumed")
#Important: use target_filesystem, not existing_target_filesystems (during initial transfer its resumable but doesnt exsit yet)
resumable_target_filesystems=zfs_get_resumable_filesystems(args.ssh_target, target_filesystems)
debug("Resumable filesystems:\n"+str(pprint.pformat(resumable_target_filesystems)))
debug("Resumable filesystems: "+str(pprint.pformat(resumable_target_filesystems)))
### get existing target snapshots
target_snapshots={}
if common_target_filesystems:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, common_target_filesystems, args.backup_name)
# except subprocess.CalledProcessError:
# verbose("(ignoring errors, probably initial backup for this filesystem)")
# pass
debug("Target snapshots:\n" + str(pprint.pformat(target_snapshots)))
### get eixsting source snapshots
### get all snapshots of all selected filesystems
verbose("Getting source snapshot-list from {0}".format(args.ssh_source))
source_snapshots=zfs_get_snapshots(args.ssh_source, source_filesystems, args.backup_name)
debug("Source snapshots:\n" + str(pprint.pformat(source_snapshots)))
source_snapshots=zfs_get_snapshots(args.ssh_source, source_filesystems, args.backup_name, also_bookmarks=True)
debug("Source snapshots: " + str(pprint.pformat(source_snapshots)))
# source_bookmarks=zfs_get_bookmarks(args.ssh_source, source_filesystems, args.backup_name)
# debug("Source bookmarks: " + str(pprint.pformat(source_bookmarks)))
### create new snapshots on source
#create new snapshot?
if not args.no_snapshot:
#determine which filesystems changed since last snapshot
if not args.allow_empty and not args.ignore_replicated:
#determine which filesystemn are unchanged since OUR snapshots. (not since ANY snapshot)
unchanged_filesystems=zfs_get_unchanged_snapshots(args.ssh_source, source_snapshots)
if not args.allow_empty:
verbose("Determining unchanged filesystems")
unchanged_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_snapshots)
else:
unchanged_filesystems=[]
@ -573,22 +569,31 @@ def zfs_autobackup():
for source_filesystem in source_filesystems:
if source_filesystem not in unchanged_filesystems:
snapshot_filesystems.append(source_filesystem)
else:
verbose("* Not snapshotting {}, no changes found.".format(source_filesystem))
#create snapshots
#create snapshot
if snapshot_filesystems:
new_snapshot_name=args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
verbose("Creating source snapshots {0} on {1} ".format(new_snapshot_name, args.ssh_source))
new_snapshot_name="@"+args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
verbose("Creating source snapshot {0} on {1} ".format(new_snapshot_name, args.ssh_source))
zfs_create_snapshot(args.ssh_source, snapshot_filesystems, new_snapshot_name)
else:
verbose("No changes at all, not creating snapshot.")
#add it to the list of source filesystems
for snapshot_filesystem in snapshot_filesystems:
source_snapshots.setdefault(snapshot_filesystem,[]).append(new_snapshot_name)
#### get target snapshots
target_snapshots={}
try:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, target_filesystems, args.backup_name)
except subprocess.CalledProcessError:
verbose("(ignoring errors, probably initial backup for this filesystem)")
pass
debug("Target snapshots: " + str(pprint.pformat(target_snapshots)))
#obsolete snapshots that may be removed
@ -601,201 +606,195 @@ def zfs_autobackup():
#determine which snapshots to send for each filesystem
for source_filesystem in source_filesystems:
try:
target_filesystem=args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path)
target_filesystem=args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path)
if source_filesystem not in source_snapshots:
#this happens if you use --no-snapshot and there are new filesystems without snapshots
verbose("* Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
else:
if source_filesystem not in source_snapshots:
#this happens if you use --no-snapshot and there are new filesystems without snapshots
verbose("Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
else:
#incremental or initial send?
if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
#incremental mode, determine what to send and what is obsolete
#incremental or initial send?
if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
#incremental mode, determine what to send and what is obsolete
#latest succesfully send snapshot, should be common on both source and target
latest_target_snapshot=target_snapshots[target_filesystem][-1]
#latest succesfully sent snapshot, should be common on both source and target (at least a bookmark on source)
latest_target_snapshot=target_snapshots[target_filesystem][-1]
if latest_target_snapshot not in source_snapshots[source_filesystem]:
#cant find latest target anymore. find first common snapshot and inform user
error_msg="Cant find latest target snapshot on source for '{}', did you destroy/rename it?".format(source_filesystem)
error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
found=False
for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
if latest_target_snapshot in source_snapshots[source_filesystem]:
error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
found=True
break
if not found:
error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
else:
if args.ignore_new:
verbose("* Skipping source filesystem '{0}', target already has newer snapshots.".format(source_filesystem))
continue
raise(Exception(error_msg))
#send all new source snapshots that come AFTER the last target snapshot
#find our starting snapshot/bookmark:
latest_target_bookmark='#'+latest_target_snapshot[1:]
if latest_target_snapshot in source_snapshots[source_filesystem]:
latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
#source snapshots that come BEFORE last target snapshot are obsolete
source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
#target snapshots that come BEFORE last target snapshot are obsolete
latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
source_bookmark=latest_target_snapshot
elif latest_target_bookmark in source_snapshots[source_filesystem]:
latest_source_index=source_snapshots[source_filesystem].index(latest_target_bookmark)
source_bookmark=latest_target_bookmark
else:
#initial mode, send all snapshots, nothing is obsolete:
latest_target_snapshot=None
send_snapshots=source_snapshots[source_filesystem]
target_obsolete_snapshots[target_filesystem]=[]
source_obsolete_snapshots[source_filesystem]=[]
#cant find latest target anymore. find first common snapshot and inform user
error_msg="Cant find latest target snapshot or bookmark on source, did you destroy/rename it?"
error_msg=error_msg+"\nLatest on target : "+target_filesystem+latest_target_snapshot
error_msg=error_msg+"\nMissing on source: "+source_filesystem+latest_target_bookmark
found=False
for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
if latest_target_snapshot in source_snapshots[source_filesystem]:
error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+latest_target_snapshot
found=True
break
if not found:
error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
#now actually send the snapshots
if not args.no_send:
raise(Exception(error_msg))
if send_snapshots and args.rollback and latest_target_snapshot:
#roll back any changes on target
debug("Rolling back target to latest snapshot.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+"@"+latest_target_snapshot ])
#send all new source snapshots that come AFTER the last target snapshot
send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
#source snapshots that come BEFORE last target snapshot are obsolete
source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
#target snapshots that come BEFORE last target snapshot are obsolete
latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
else:
#initial mode, send all snapshots, nothing is obsolete:
source_bookmark=None
latest_target_snapshot=None
send_snapshots=source_snapshots[source_filesystem]
target_obsolete_snapshots[target_filesystem]=[]
source_obsolete_snapshots[source_filesystem]=[]
#now actually send the snapshots
if not args.no_send:
if send_snapshots and args.rollback and latest_target_snapshot:
#roll back any changes on target
debug("Rolling back target to latest snapshot.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+latest_target_snapshot ])
for send_snapshot in send_snapshots:
for send_snapshot in send_snapshots:
#resumable?
if target_filesystem in resumable_target_filesystems:
resume_token=resumable_target_filesystems.pop(target_filesystem)
else:
resume_token=None
#resumable?
if target_filesystem in resumable_target_filesystems:
resume_token=resumable_target_filesystems.pop(target_filesystem)
else:
resume_token=None
#hold the snapshot we're sending on the source
if not args.no_holds:
zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)
#hold the snapshot we're sending on the source
zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+send_snapshot)
zfs_transfer(
ssh_source=args.ssh_source, source_filesystem=source_filesystem,
first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
ssh_target=args.ssh_target, target_filesystem=target_filesystem,
resume_token=resume_token
)
zfs_transfer(
ssh_source=args.ssh_source, source_filesystem=source_filesystem,
first_snapshot=source_bookmark, second_snapshot=send_snapshot,
ssh_target=args.ssh_target, target_filesystem=target_filesystem,
resume_token=resume_token
)
#hold the snapshot we just send to the target
zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)
#hold the snapshot we just send on the target
zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+send_snapshot)
#bookmark the snapshot we just send on the source, so we can also release and mark it obsolete.
zfs_bookmark_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+send_snapshot)
zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+send_snapshot)
source_obsolete_snapshots[source_filesystem].append(send_snapshot)
#now that we succesfully transferred this snapshot, cleanup the previous stuff
if latest_target_snapshot:
#dont need the latest_target_snapshot anymore
zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+latest_target_snapshot)
target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
#now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
if latest_target_snapshot:
zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
#delete previous bookmark
zfs_destroy_bookmark(ssh_to=args.ssh_source, bookmark=source_filesystem+source_bookmark)
if not args.no_holds:
zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
#we just received a new filesytem?
else:
if args.clear_refreservation:
debug("Clearing refreservation to save space.")
# zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
# source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
#we just received a new filesytem?
else:
if args.clear_refreservation:
debug("Clearing refreservation to save space.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])
if args.clear_mountpoint:
debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")
if args.clear_mountpoint:
debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )
latest_target_snapshot=send_snapshot
# failed, skip this source_filesystem
except Exception as e:
failed(str(e))
latest_target_snapshot=send_snapshot
source_bookmark='#'+latest_target_snapshot[1:]
############## cleanup section
#we only do cleanups after everything is complete, to keep everything consistent (same snapshots everywhere)
if not args.ignore_replicated:
#find stale backups on target that have become obsolete
#find stale backups on target that have become obsolete
verbose("Getting stale filesystems and snapshots from {0}".format(args.ssh_target))
stale_target_filesystems=get_stale_backupped_filesystems(ssh_to=args.ssh_target, backup_name=args.backup_name, target_fs=args.target_fs, target_filesystems=target_filesystems)
debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
stale_target_filesystems=get_stale_backupped_filesystems(backup_name=args.backup_name, target_path=args.target_path, target_filesystems=target_filesystems, existing_target_filesystems=existing_target_filesystems)
debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
target_obsolete_snapshots.update(stale_target_snapshots)
stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
target_obsolete_snapshots.update(stale_target_snapshots)
#determine stale filesystems that have no snapshots left (the can be destroyed)
stale_target_destroys=[]
for stale_target_filesystem in stale_target_filesystems:
if stale_target_filesystem not in stale_target_snapshots:
stale_target_destroys.append(stale_target_filesystem)
if stale_target_destroys:
#NOTE: dont destroy automaticly..not safe enough.
# if args.destroy_stale:
# verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
# zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
# else:
verbose("Stale filesystems on {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
else:
verbose("NOTE: Cant determine stale target filesystems while using ignore_replicated.")
#determine stale filesystems that have no snapshots left (the can be destroyed)
#TODO: prevent destroying filesystems that have underlying filesystems that are still active.
stale_target_destroys=[]
for stale_target_filesystem in stale_target_filesystems:
if stale_target_filesystem not in stale_target_snapshots:
stale_target_destroys.append(stale_target_filesystem)
if stale_target_destroys:
if args.destroy_stale:
verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
else:
verbose("Stale filesystems on {0}, use --destroy-stale to destroy:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
#now actually destroy the old snapshots
source_destroys=determine_destroy_list(source_obsolete_snapshots, args.keep_source)
if source_destroys:
verbose("Destroying old snapshots on source {0}:\n{1}".format(args.ssh_source, "\n".join(source_destroys)))
try:
zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
except Exception as e:
failed(str(e))
zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
target_destroys=determine_destroy_list(target_obsolete_snapshots, args.keep_target)
if target_destroys:
verbose("Destroying old snapshots on target {0}:\n{1}".format(args.ssh_target, "\n".join(target_destroys)))
try:
zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
except Exception as e:
failed(str(e))
zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
verbose("All done")
################################################################## ENTRY POINT
# parse arguments
import argparse
parser = argparse.ArgumentParser(
description='ZFS autobackup v2.4',
epilog='When a filesystem fails, zfs_backup will continue and report the number of failures at that end. Also the exit code will indicate the number of failures.')
parser = argparse.ArgumentParser(description='ZFS autobackup v2.2')
parser.add_argument('--ssh-source', default="local", help='Source host to get backup from. (user@hostname) Default %(default)s.')
parser.add_argument('--ssh-target', default="local", help='Target host to push backup to. (user@hostname) Default %(default)s.')
parser.add_argument('--keep-source', type=int, default=30, help='Number of days to keep old snapshots on source. Default %(default)s.')
parser.add_argument('--keep-target', type=int, default=30, help='Number of days to keep old snapshots on target. Default %(default)s.')
parser.add_argument('backup_name', help='Name of the backup (you should set the zfs property "autobackup:backup-name" to true on filesystems you want to backup')
parser.add_argument('target_path', help='Target path')
parser.add_argument('target_fs', help='Target filesystem')
parser.add_argument('--no-snapshot', action='store_true', help='dont create new snapshot (usefull for finishing uncompleted backups, or cleanups)')
parser.add_argument('--no-send', action='store_true', help='dont send snapshots (usefull to only do a cleanup)')
parser.add_argument('--allow-empty', action='store_true', help='if nothing has changed, still create empty snapshots.')
parser.add_argument('--ignore-replicated', action='store_true', help='Ignore datasets that seem to be replicated some other way. (No changes since lastest snapshot. Usefull for proxmox HA replication)')
parser.add_argument('--no-holds', action='store_true', help='Dont lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
parser.add_argument('--ignore-new', action='store_true', help='Ignore filesystem if there are already newer snapshots for it on the target (use with caution)')
parser.add_argument('--resume', action='store_true', help='support resuming of interrupted transfers by using the zfs extensible_dataset feature (both zpools should have it enabled) Disadvantage is that you need to use zfs recv -A if another snapshot is created on the target during a receive. Otherwise it will keep failing.')
parser.add_argument('--strip-path', default=0, type=int, help='number of directory to strip from path (use 1 when cloning zones between 2 SmartOS machines)')
parser.add_argument('--buffer', default="", help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G) Will also show nice progress output.')
parser.add_argument('--buffer', default="", help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G)')
# parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
parser.add_argument('--clear-refreservation', action='store_true', help='Set refreservation property to none for new filesystems. Usefull when backupping SmartOS volumes. (recommended)')
parser.add_argument('--clear-mountpoint', action='store_true', help='Sets canmount=noauto property, to prevent the received filesystem from mounting over existing filesystems. (recommended)')
parser.add_argument('--filter-properties', action='append', help='Filter properties when receiving filesystems. Can be specified multiple times. (Example: If you send data from Linux to FreeNAS, you should filter xattr)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_path to on)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_fs to on)')
parser.add_argument('--ignore-transfer-errors', action='store_true', help='Ignore transfer errors (still checks if received filesystem exists. usefull for acltype errors)')
@ -806,23 +805,11 @@ parser.add_argument('--debug', action='store_true', help='debug output (shows co
#note args is the only global variable we use, since its a global readonly setting anyway
args = parser.parse_args()
if args.ignore_replicated and args.allow_empty:
abort("Cannot use allow_empty with ignore_replicated.")
try:
zfs_autobackup()
if not failures:
verbose("All operations completed succesfully.")
sys.exit(0)
else:
verbose("{} OPERATION(S) FAILED!".format(failures))
#exit with the number of failures.
sys.exit(min(255,failures))
except Exception as e:
if args.debug:
raise
else:
print("* ABORTED *")
print(str(e))
abort("FATAL ERROR")