Compare commits

...

24 Commits

Author SHA1 Message Date
17445ec54a Update README.md 2019-11-10 01:16:48 +01:00
07a150618a Update README.md 2019-11-10 01:01:08 +01:00
067f3b92d1 Update README.md 2019-11-10 00:51:20 +01:00
71a394cfc7 rollback 2019-10-19 14:54:10 +02:00
bfc36ac87f rollback 2019-10-19 14:53:31 +02:00
ad47b26f56 Revert "fixing quota issues"
This reverts commit d973905303.
2019-10-17 10:42:54 +02:00
f38da17592 v3.0: no longer replicate all properties by default. this made things unnecessary complicated. now use the --properties option to specify the properties you want. 2019-10-16 13:37:31 +02:00
d973905303 fixing quota issues 2019-10-16 12:51:12 +02:00
82465acd5b clearification of testmode 2019-10-16 10:27:35 +02:00
514131d67c update docs 2019-10-16 09:45:08 +02:00
dfcae1613b clearify the target path is a zfs filesystem, not an regular path 2019-10-16 09:43:28 +02:00
67b21b4015 bugfix: exitcode always was 255 2019-10-16 09:28:21 +02:00
3907c850a6 Update zfs_autobackup 2019-10-02 22:58:50 +02:00
3b9b96243b dont destroy stale snapshots if we're using --ignore-replicated 2019-10-02 19:39:31 +02:00
54235f455a zfs_autobackup 2.4: try to continue on non-fatal errors 2019-10-02 18:21:24 +02:00
c176b968a9 forgot to return exit code when not using debug mode :( 2019-03-26 23:06:03 +01:00
921f7df0a5 updated readme 2019-02-19 11:24:10 +01:00
edee598cf8 updated readme 2019-02-19 11:19:16 +01:00
80b3272f0f disable destroy-stale for now. updated readme 2019-02-19 11:09:54 +01:00
617e0fb69b fix/revisit stale filesystem detection 2019-02-19 11:04:59 +01:00
46a85fd170 updated readme 2019-02-19 10:26:52 +01:00
1f59229419 fixes 2019-02-19 01:28:22 +01:00
fcd98e2d87 much cleaner output and layout. removed useless error output. general cleanup. 2019-02-19 00:17:20 +01:00
dd8b2442ec options for proxmox HA: no-holds, ignore-new and ignore-replicated 2019-02-18 18:53:54 +01:00
2 changed files with 367 additions and 206 deletions

106
README.md
View File

@ -1,12 +1,18 @@
# ZFS autobackup
(checkout v3.0-beta for the new cool stuff: https://github.com/psy0rz/zfs_autobackup/blob/v3/README.md)
Official releases: https://github.com/psy0rz/zfs_autobackup/releases
Introduction
============
ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. This is done using the very effcient zfs send and receive commands.
It has the following features:
* Automaticly selects filesystems to backup by looking at a simple ZFS property. (recursive)
* Works across operating systems: Tested with Linux, FreeBSD/FreeNAS and SmartOS.
* Works in combination with existing replication systems. (Like Proxmox HA)
* Automatically selects filesystems to backup by looking at a simple ZFS property. (recursive)
* Creates consistent snapshots. (takes all snapshots at once, atomic.)
* Multiple backups modes:
* "push" local data to a backup-server via SSH.
@ -16,34 +22,37 @@ It has the following features:
* Supports resuming of interrupted transfers. (via the zfs extensible_dataset feature)
* Backups and snapshots can be named to prevent conflicts. (multiple backups from and to the same filesystems are no problem)
* Always creates a new snapshot before starting.
* Checks everything and aborts on errors.
* Checks everything but tries continue on non-fatal errors when possible. (Reports error-count when done)
* Ability to 'finish' aborted backups to see what goes wrong.
* Easy to debug and has a test-mode. Actual unix commands are printed.
* Keeps latest X snapshots remote and locally. (default 30, configurable)
* Uses zfs-holds on important snapshots so they cant be accidentally destroyed.
* Easy installation:
* Only one host needs the zfs_autobackup script. The other host just needs ssh and the zfs command.
* Written in python and uses zfs-commands, no 3rd party dependencys or libraries.
* No seperate config files or properties. Just one command you can copy/paste in your backup script.
* Written in python and uses zfs-commands, no 3rd party dependency's or libraries.
* No separate config files or properties. Just one command you can copy/paste in your backup script.
Usage
====
```
usage: zfs_autobackup [-h] [--ssh-source SSH_SOURCE] [--ssh-target SSH_TARGET]
[--keep-source KEEP_SOURCE] [--keep-target KEEP_TARGET]
[--no-snapshot] [--no-send] [--resume]
[--strip-path STRIP_PATH] [--destroy-stale]
[--no-snapshot] [--no-send] [--allow-empty]
[--ignore-replicated] [--no-holds] [--ignore-new]
[--resume] [--strip-path STRIP_PATH] [--buffer BUFFER]
[--clear-refreservation] [--clear-mountpoint]
[--filter-properties FILTER_PROPERTIES] [--rollback]
[--test] [--verbose] [--debug]
backup_name target_fs
[--ignore-transfer-errors] [--test] [--verbose]
[--debug]
backup_name target_path
ZFS autobackup v2.2
ZFS autobackup v2.4
positional arguments:
backup_name Name of the backup (you should set the zfs property
"autobackup:backup-name" to true on filesystems you
want to backup
target_fs Target filesystem
target_path Target path
optional arguments:
-h, --help show this help message and exit
@ -62,6 +71,14 @@ optional arguments:
--no-snapshot dont create new snapshot (usefull for finishing
uncompleted backups, or cleanups)
--no-send dont send snapshots (usefull to only do a cleanup)
--allow-empty if nothing has changed, still create empty snapshots.
--ignore-replicated Ignore datasets that seem to be replicated some other
way. (No changes since lastest snapshot. Usefull for
proxmox HA replication)
--no-holds Dont lock snapshots on the source. (Usefull to allow
proxmox HA replication to switches nodes)
--ignore-new Ignore filesystem if there are already newer snapshots
for it on the target (use with caution)
--resume support resuming of interrupted transfers by using the
zfs extensible_dataset feature (both zpools should
have it enabled) Disadvantage is that you need to use
@ -71,8 +88,9 @@ optional arguments:
--strip-path STRIP_PATH
number of directory to strip from path (use 1 when
cloning zones between 2 SmartOS machines)
--destroy-stale Destroy stale backups that have no more snapshots. Be
sure to verify the output before using this!
--buffer BUFFER Use mbuffer with specified size to speedup zfs
transfer. (e.g. --buffer 1G) Will also show nice
progress output.
--clear-refreservation
Set refreservation property to none for new
filesystems. Usefull when backupping SmartOS volumes.
@ -86,11 +104,17 @@ optional arguments:
from Linux to FreeNAS, you should filter xattr)
--rollback Rollback changes on the target before starting a
backup. (normally you can prevent changes by setting
the readonly property on the target_fs to on)
the readonly property on the target_path to on)
--ignore-transfer-errors
Ignore transfer errors (still checks if received
filesystem exists. usefull for acltype errors)
--test dont change anything, just show what would be done
(still does all read-only operations)
--verbose verbose output
--debug debug output (shows commands that are executed)
When a filesystem fails, zfs_backup will continue and report the number of
failures at that end. Also the exit code will indicate the number of failures.
```
Backup example
@ -200,7 +224,7 @@ Host smartos04
Compression yes
```
This way you can just specify smartos04
This way you can just specify "smartos04" as host.
Also uses compression on slow links.
@ -236,8 +260,6 @@ root@fs1:/home/psy# zfs send fs1/zones/backup/zfsbackups/smartos01.server.com/z
After that you can rename the disk image from the temporary location to the location of a new SmartOS machine you've created.
Monitoring with Zabbix-jobs
===========================
@ -249,3 +271,55 @@ zabbix-job-status backup_smartos01_fs1 daily $?
```
This will update the zabbix server with the exitcode and will also alert you if the job didnt run for more than 2 days.
Backuping up a proxmox cluster with HA replication
==================================================
Due to the nature of proxmox we had to make a few enhancements to zfs_autobackup. This will probably also benefit other systems that use their own replication in combination with zfs_autobackup.
All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs_autobackup will be fine. (and it will get the next backup from the new node automaticly)
In the example below we have 3 nodes, named h4, h5 and h6.
The backup will go to a machine named smartos03.
Preparing the proxmox nodes
---------------------------
On each node select the filesystems as following:
```
root@h4:~# zfs set autobackup:h4_smartos03=true rpool
root@h4:~# zfs set autobackup:h4_smartos03=false rpool/data
root@h4:~# zfs set autobackup:data_smartos03=child rpool/data
```
* rpool will be backuped the usual way, and is named h4_smartos03. (each node will have a unique name)
* rpool/data will be excluded from the usual backup
* The CHILDREN of rpool/data be selected for a cluster wide backup named data_smartos03. (each node uses the same backup name)
Preparing the backup server
---------------------------
Extra options needed for proxmox with HA:
* --no-holds: To allow proxmox to destroy our snapshots if a VM migrates to another node.
* --ignore-replicated: To ignore the replicated filesystems of proxmox on the receiving proxmox nodes. (e.g: only backup from the node where the VM is active)
I use the following backup script on the backup server:
```
for H in h4 h5 h6; do
echo "################################### DATA $H"
#backup data filesystems to a common place
./zfs_autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --no-holds $@
zabbix-job-status backup_$H""_data_smartos03 daily $? >/dev/null 2>/dev/null
echo "################################### RPOOL $H"
#backup rpool to own place
./zfs_autobackup --ssh-source root@$H $H""_smartos03 zones/backup/zfsbackups/$H --verbose --clear-refreservation --clear-mountpoint --resume --ignore-transfer-errors $@
zabbix-job-status backup_$H""_smartos03 daily $? >/dev/null 2>/dev/null
done
```

View File

@ -1,5 +1,9 @@
#!/usr/bin/env python2
# -*- coding: utf8 -*-
#(C)edwin@datux.nl -- Edwin Eefting
#Release under GPL.
from __future__ import print_function
import os
import sys
@ -13,18 +17,20 @@ import time
def error(txt):
print(txt, file=sys.stderr)
def verbose(txt):
if args.verbose:
print(txt)
def debug(txt):
if args.debug:
print(txt)
#fatal abort execution, exit code 255
def abort(txt):
error(txt)
sys.exit(255)
"""run a command. specifiy ssh user@host to run remotely"""
def run(cmd, input=None, ssh_to="local", tab_split=False, valid_exitcodes=[ 0 ], test=False):
@ -95,7 +101,7 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
for source_filesystem in source_filesystems:
(name,value,source)=source_filesystem
if value=="false":
verbose("Ignored : {0} (disabled)".format(name))
verbose("* Ignored : {0} (disabled)".format(name))
else:
if source=="local" and ( value=="true" or value=="child"):
@ -103,16 +109,16 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
if source=="local" and value=="true":
selected_filesystems.append(name)
verbose("Selected: {0} (direct selection)".format(name))
verbose("* Selected: {0} (direct selection)".format(name))
elif source.find("inherited from ")==0 and (value=="true" or value=="child"):
inherited_from=re.sub("^inherited from ", "", source)
if inherited_from in direct_filesystems:
selected_filesystems.append(name)
verbose("Selected: {0} (inherited selection)".format(name))
verbose("* Selected: {0} (inherited selection)".format(name))
else:
verbose("Ignored : {0} (already a backup)".format(name))
verbose("* Ignored : {0} (already a backup)".format(name))
else:
verbose("Ignored : {0} (only childs)".format(name))
verbose("* Ignored : {0} (only childs)".format(name))
return(selected_filesystems)
@ -123,7 +129,6 @@ def zfs_get_resumable_filesystems(ssh_to, filesystems):
cmd=[ "zfs", "get", "-t", "volume,filesystem", "-o", "name,value", "-H", "receive_resume_token" ]
cmd.extend(filesystems)
#TODO: get rid of ugly errors for non-existing target filesystems
resumable_filesystems=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes= [ 0,1 ] )
ret={}
@ -177,12 +182,12 @@ def zfs_create_snapshot(ssh_to, filesystems, snapshot):
cmd.append(filesystem+"@"+snapshot)
#in testmode we dont actually make changes, so keep them in a list to simulate
if args.test:
if not ssh_to in test_snapshots:
test_snapshots[ssh_to]={}
if not filesystem in test_snapshots[ssh_to]:
test_snapshots[ssh_to][filesystem]=[]
test_snapshots[ssh_to][filesystem].append(snapshot)
# if args.test:
# if not ssh_to in test_snapshots:
# test_snapshots[ssh_to]={}
# if not filesystem in test_snapshots[ssh_to]:
# test_snapshots[ssh_to][filesystem]=[]
# test_snapshots[ssh_to][filesystem].append(snapshot)
run(ssh_to=ssh_to, tab_split=False, cmd=cmd, test=args.test)
@ -196,13 +201,12 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
ret={}
if filesystems:
#TODO: get rid of ugly errors for non-existing target filesystems
cmd=[
"zfs", "list", "-d", "1", "-r", "-t" ,"snapshot", "-H", "-o", "name"
]
cmd.extend(filesystems)
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0,1 ])
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
for snapshot in snapshots:
@ -213,13 +217,13 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
ret[filesystem].append(snapshot_name)
#also add any test-snapshots that where created with --test mode
if args.test:
if ssh_to in test_snapshots:
for filesystem in filesystems:
if filesystem in test_snapshots[ssh_to]:
if not filesystem in ret:
ret[filesystem]=[]
ret[filesystem].extend(test_snapshots[ssh_to][filesystem])
# if args.test:
# if ssh_to in test_snapshots:
# for filesystem in filesystems:
# if filesystem in test_snapshots[ssh_to]:
# if not filesystem in ret:
# ret[filesystem]=[]
# ret[filesystem].extend(test_snapshots[ssh_to][filesystem])
return(ret)
@ -295,13 +299,13 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
if not first_snapshot:
txt="Initial transfer of "+source_filesystem+" snapshot "+second_snapshot
txt=">>> Transfer: "+source_filesystem+"@"+second_snapshot
else:
txt="Incremental transfer of "+source_filesystem+" between snapshots "+first_snapshot+"..."+second_snapshot
txt=">>> Transfer: "+source_filesystem+"@"+first_snapshot+"...@"+second_snapshot
if resume_token:
source_cmd.extend([ "-t", resume_token ])
verbose("RESUMING "+txt)
txt=txt+" [RESUMED]"
else:
source_cmd.append("-p")
@ -314,7 +318,7 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
else:
source_cmd.append(source_filesystem + "@" + second_snapshot)
verbose(txt)
verbose(txt)
if args.buffer and args.ssh_source!="local":
source_cmd.append("|mbuffer -m {}".format(args.buffer))
@ -333,8 +337,7 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
for filter_property in args.filter_properties:
target_cmd.extend([ "-x" , filter_property ])
#also verbose in --verbose mode so we can see the transfer speed when its completed
if args.verbose or args.debug:
if args.debug:
target_cmd.append("-v")
if args.resume:
@ -381,32 +384,41 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
"""get filesystems that where already backupped to a target. """
def zfs_get_backupped_filesystems(ssh_to, backup_name, target_fs):
#get all target filesystems that have received or inherited the backup propert, under the target_fs tree
ret=run(ssh_to=ssh_to, tab_split=False, cmd=[
"zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_fs
#NOTE: unreliable when using with autobackup:bla=child
# """get filesystems that where already backupped to a target. """
# def zfs_get_backupped_filesystems(ssh_to, backup_name, target_path):
# #get all target filesystems that have received or inherited the backup propert, under the target_path tree
# ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
# "zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_path
# ])
#
# return(ret)
"""get existing filesystems """
def zfs_get_existing_filesystems(ssh_to, target_path):
#get all target filesystems that have received or inherited the backup propert, under the target_path tree
ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
"zfs", "list", "-r", "-t", "volume,filesystem", "-o", "name", "-H", target_path
])
return(ret)
"""get filesystems that where once backupped to target but are no longer selected on source
these are filesystems that are not in the list in target_filesystems.
this happens when filesystems are destroyed or unselected on the source.
"""
def get_stale_backupped_filesystems(ssh_to, backup_name, target_fs, target_filesystems):
def get_stale_backupped_filesystems(backup_name, target_path, target_filesystems, existing_target_filesystems):
backupped_filesystems=zfs_get_backupped_filesystems(ssh_to=ssh_to, backup_name=backup_name, target_fs=target_fs)
#determine backupped filesystems that are not in target_filesystems anymore
stale_backupped_filesystems=[]
for backupped_filesystem in backupped_filesystems:
if backupped_filesystem not in target_filesystems:
stale_backupped_filesystems.append(backupped_filesystem)
for existing_target_filesystem in existing_target_filesystems:
if existing_target_filesystem not in target_filesystems:
stale_backupped_filesystems.append(existing_target_filesystem)
return(stale_backupped_filesystems)
@ -434,30 +446,50 @@ def lstrip_path(path, count):
return("/".join(path.split("/")[count:]))
"""get list of filesystems that are changed, compared to the latest snapshot"""
def zfs_get_unchanged_filesystems(ssh_to, snapshots):
"""get list of filesystems that are changed, compared to specified latest snapshot. """
def zfs_get_unchanged_snapshots(ssh_to, snapshots):
ret=[]
for ( filesystem, snapshot_list ) in snapshots.items():
latest_snapshot=snapshot_list[-1]
cmd=[
"zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem
]
cmd=[ "zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem ]
output=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
if output[0]=="0B":
if output[0]=="0B" or output[0]=="0":
ret.append(filesystem)
return(ret)
"""get filesytems that are have changed since any snapshot."""
def zfs_get_unchanged_filesystems(ssh_to, filesystems):
ret=[]
cmd=[ "zfs", "get","-H" ,"-oname,value", "written" ]
cmd.extend(filesystems)
output=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes=[ 0 ])
for ( filesystem , written ) in output:
if written=="0B" or written=="0":
ret.append(filesystem)
verbose("No changes on {}".format(filesystem))
return(ret)
#fugly..
failures=0
#something failed, but we try to continue with the rest
def failed(txt):
global failures
failures=failures+1
error("FAILURE: "+txt+"\n")
def zfs_autobackup():
############## data gathering section
if args.test:
@ -467,43 +499,73 @@ def zfs_autobackup():
### getting and determinging source/target filesystems
# get selected filesystem on backup source
# get selected filesystems on backup source
verbose("Getting selected source filesystems for backup {0} on {1}".format(args.backup_name,args.ssh_source))
source_filesystems=zfs_get_selected_filesystems(args.ssh_source, args.backup_name)
#nothing todo
if not source_filesystems:
error("No filesystems source selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
sys.exit(1)
abort("No source filesystems selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
if args.ignore_replicated:
replicated_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_filesystems)
for replicated_filesystem in replicated_filesystems:
if replicated_filesystem in source_filesystems:
source_filesystems.remove(replicated_filesystem)
verbose("* Already replicated: {}".format(replicated_filesystem))
if not source_filesystems:
verbose("Nothing to do, all filesystems are already replicated.")
sys.exit(0)
# determine target filesystems
target_filesystems=[]
for source_filesystem in source_filesystems:
#append args.target_fs prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path))
#append args.target_path prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path))
debug("Wanted target filesystems:\n"+str(pprint.pformat(target_filesystems)))
# get actual existing target filesystems. (including ones that might not be in the backupset anymore)
verbose("Getting existing target filesystems")
existing_target_filesystems=zfs_get_existing_filesystems(ssh_to=args.ssh_target, target_path=args.target_path)
debug("Existing target filesystems:\n"+str(pprint.pformat(existing_target_filesystems)))
common_target_filesystems=list(set(target_filesystems) & set(existing_target_filesystems))
debug("Common target filesystems (target filesystems that also exist on source):\n"+str(pprint.pformat(common_target_filesystems)))
### get resumable transfers
### get resumable transfers from target
resumable_target_filesystems={}
if args.resume:
verbose("Checking for aborted transfers that can be resumed")
#Important: use target_filesystem, not existing_target_filesystems (during initial transfer its resumable but doesnt exsit yet)
resumable_target_filesystems=zfs_get_resumable_filesystems(args.ssh_target, target_filesystems)
debug("Resumable filesystems: "+str(pprint.pformat(resumable_target_filesystems)))
debug("Resumable filesystems:\n"+str(pprint.pformat(resumable_target_filesystems)))
### get all snapshots of all selected filesystems
### get existing target snapshots
target_snapshots={}
if common_target_filesystems:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, common_target_filesystems, args.backup_name)
# except subprocess.CalledProcessError:
# verbose("(ignoring errors, probably initial backup for this filesystem)")
# pass
debug("Target snapshots:\n" + str(pprint.pformat(target_snapshots)))
### get eixsting source snapshots
verbose("Getting source snapshot-list from {0}".format(args.ssh_source))
source_snapshots=zfs_get_snapshots(args.ssh_source, source_filesystems, args.backup_name)
debug("Source snapshots: " + str(pprint.pformat(source_snapshots)))
debug("Source snapshots:\n" + str(pprint.pformat(source_snapshots)))
#create new snapshot?
### create new snapshots on source
if not args.no_snapshot:
#determine which filesystems changed since last snapshot
if not args.allow_empty:
verbose("Determining unchanged filesystems")
unchanged_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_snapshots)
if not args.allow_empty and not args.ignore_replicated:
#determine which filesystemn are unchanged since OUR snapshots. (not since ANY snapshot)
unchanged_filesystems=zfs_get_unchanged_snapshots(args.ssh_source, source_snapshots)
else:
unchanged_filesystems=[]
@ -511,31 +573,22 @@ def zfs_autobackup():
for source_filesystem in source_filesystems:
if source_filesystem not in unchanged_filesystems:
snapshot_filesystems.append(source_filesystem)
else:
verbose("* Not snapshotting {}, no changes found.".format(source_filesystem))
#create snapshot
#create snapshots
if snapshot_filesystems:
new_snapshot_name=args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
verbose("Creating source snapshot {0} on {1} ".format(new_snapshot_name, args.ssh_source))
verbose("Creating source snapshots {0} on {1} ".format(new_snapshot_name, args.ssh_source))
zfs_create_snapshot(args.ssh_source, snapshot_filesystems, new_snapshot_name)
else:
verbose("No changes at all, not creating snapshot.")
#add it to the list of source filesystems
for snapshot_filesystem in snapshot_filesystems:
source_snapshots.setdefault(snapshot_filesystem,[]).append(new_snapshot_name)
#### get target snapshots
target_snapshots={}
try:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, target_filesystems, args.backup_name)
except subprocess.CalledProcessError:
verbose("(ignoring errors, probably initial backup for this filesystem)")
pass
debug("Target snapshots: " + str(pprint.pformat(target_snapshots)))
#obsolete snapshots that may be removed
@ -548,179 +601,201 @@ def zfs_autobackup():
#determine which snapshots to send for each filesystem
for source_filesystem in source_filesystems:
target_filesystem=args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path)
try:
target_filesystem=args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path)
if source_filesystem not in source_snapshots:
#this happens if you use --no-snapshot and there are new filesystems without snapshots
verbose("Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
else:
#incremental or initial send?
if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
#incremental mode, determine what to send and what is obsolete
#latest succesfully send snapshot, should be common on both source and target
latest_target_snapshot=target_snapshots[target_filesystem][-1]
if latest_target_snapshot not in source_snapshots[source_filesystem]:
#cant find latest target anymore. find first common snapshot and inform user
error_msg="Cant find latest target snapshot on source, did you destroy/rename it?"
error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
found=False
for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
if latest_target_snapshot in source_snapshots[source_filesystem]:
error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
found=True
break
if not found:
error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
raise(Exception(error_msg))
#send all new source snapshots that come AFTER the last target snapshot
latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
#source snapshots that come BEFORE last target snapshot are obsolete
source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
#target snapshots that come BEFORE last target snapshot are obsolete
latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
if source_filesystem not in source_snapshots:
#this happens if you use --no-snapshot and there are new filesystems without snapshots
verbose("* Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
else:
#initial mode, send all snapshots, nothing is obsolete:
latest_target_snapshot=None
send_snapshots=source_snapshots[source_filesystem]
target_obsolete_snapshots[target_filesystem]=[]
source_obsolete_snapshots[source_filesystem]=[]
#now actually send the snapshots
if not args.no_send:
#incremental or initial send?
if target_filesystem in target_snapshots and target_snapshots[target_filesystem]:
#incremental mode, determine what to send and what is obsolete
if send_snapshots and args.rollback and latest_target_snapshot:
#roll back any changes on target
debug("Rolling back target to latest snapshot.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+"@"+latest_target_snapshot ])
#latest succesfully send snapshot, should be common on both source and target
latest_target_snapshot=target_snapshots[target_filesystem][-1]
if latest_target_snapshot not in source_snapshots[source_filesystem]:
#cant find latest target anymore. find first common snapshot and inform user
error_msg="Cant find latest target snapshot on source for '{}', did you destroy/rename it?".format(source_filesystem)
error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
found=False
for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
if latest_target_snapshot in source_snapshots[source_filesystem]:
error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
found=True
break
if not found:
error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
else:
if args.ignore_new:
verbose("* Skipping source filesystem '{0}', target already has newer snapshots.".format(source_filesystem))
continue
raise(Exception(error_msg))
#send all new source snapshots that come AFTER the last target snapshot
latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
send_snapshots=source_snapshots[source_filesystem][latest_source_index+1:]
#source snapshots that come BEFORE last target snapshot are obsolete
source_obsolete_snapshots[source_filesystem]=source_snapshots[source_filesystem][0:latest_source_index]
#target snapshots that come BEFORE last target snapshot are obsolete
latest_target_index=target_snapshots[target_filesystem].index(latest_target_snapshot)
target_obsolete_snapshots[target_filesystem]=target_snapshots[target_filesystem][0:latest_target_index]
else:
#initial mode, send all snapshots, nothing is obsolete:
latest_target_snapshot=None
send_snapshots=source_snapshots[source_filesystem]
target_obsolete_snapshots[target_filesystem]=[]
source_obsolete_snapshots[source_filesystem]=[]
#now actually send the snapshots
if not args.no_send:
if send_snapshots and args.rollback and latest_target_snapshot:
#roll back any changes on target
debug("Rolling back target to latest snapshot.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "rollback", target_filesystem+"@"+latest_target_snapshot ])
for send_snapshot in send_snapshots:
for send_snapshot in send_snapshots:
#resumable?
if target_filesystem in resumable_target_filesystems:
resume_token=resumable_target_filesystems.pop(target_filesystem)
else:
resume_token=None
#resumable?
if target_filesystem in resumable_target_filesystems:
resume_token=resumable_target_filesystems.pop(target_filesystem)
else:
resume_token=None
#hold the snapshot we're sending on the source
zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)
#hold the snapshot we're sending on the source
if not args.no_holds:
zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)
zfs_transfer(
ssh_source=args.ssh_source, source_filesystem=source_filesystem,
first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
ssh_target=args.ssh_target, target_filesystem=target_filesystem,
resume_token=resume_token
)
zfs_transfer(
ssh_source=args.ssh_source, source_filesystem=source_filesystem,
first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
ssh_target=args.ssh_target, target_filesystem=target_filesystem,
resume_token=resume_token
)
#hold the snapshot we just send to the target
zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)
#hold the snapshot we just send to the target
zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)
#now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
if latest_target_snapshot:
zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
#now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
if latest_target_snapshot:
zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
#we just received a new filesytem?
else:
if args.clear_refreservation:
debug("Clearing refreservation to save space.")
if not args.no_holds:
zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
#we just received a new filesytem?
else:
if args.clear_refreservation:
debug("Clearing refreservation to save space.")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "refreservation=none", target_filesystem ])
if args.clear_mountpoint:
debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")
if args.clear_mountpoint:
debug("Setting canmount=noauto to prevent auto-mounting in the wrong place. (ignoring errors)")
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )
run(ssh_to=args.ssh_target, test=args.test, cmd=["zfs", "set", "canmount=noauto", target_filesystem ], valid_exitcodes= [0, 1] )
latest_target_snapshot=send_snapshot
latest_target_snapshot=send_snapshot
# failed, skip this source_filesystem
except Exception as e:
failed(str(e))
############## cleanup section
#we only do cleanups after everything is complete, to keep everything consistent (same snapshots everywhere)
#find stale backups on target that have become obsolete
verbose("Getting stale filesystems and snapshots from {0}".format(args.ssh_target))
stale_target_filesystems=get_stale_backupped_filesystems(ssh_to=args.ssh_target, backup_name=args.backup_name, target_fs=args.target_fs, target_filesystems=target_filesystems)
debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
if not args.ignore_replicated:
#find stale backups on target that have become obsolete
stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
target_obsolete_snapshots.update(stale_target_snapshots)
stale_target_filesystems=get_stale_backupped_filesystems(backup_name=args.backup_name, target_path=args.target_path, target_filesystems=target_filesystems, existing_target_filesystems=existing_target_filesystems)
debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
#determine stale filesystems that have no snapshots left (the can be destroyed)
#TODO: prevent destroying filesystems that have underlying filesystems that are still active.
stale_target_destroys=[]
for stale_target_filesystem in stale_target_filesystems:
if stale_target_filesystem not in stale_target_snapshots:
stale_target_destroys.append(stale_target_filesystem)
stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
debug("Stale target snapshots: " + str(pprint.pformat(stale_target_snapshots)))
target_obsolete_snapshots.update(stale_target_snapshots)
#determine stale filesystems that have no snapshots left (the can be destroyed)
stale_target_destroys=[]
for stale_target_filesystem in stale_target_filesystems:
if stale_target_filesystem not in stale_target_snapshots:
stale_target_destroys.append(stale_target_filesystem)
if stale_target_destroys:
#NOTE: dont destroy automaticly..not safe enough.
# if args.destroy_stale:
# verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
# zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
# else:
verbose("Stale filesystems on {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
else:
verbose("NOTE: Cant determine stale target filesystems while using ignore_replicated.")
if stale_target_destroys:
if args.destroy_stale:
verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
else:
verbose("Stale filesystems on {0}, use --destroy-stale to destroy:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
#now actually destroy the old snapshots
source_destroys=determine_destroy_list(source_obsolete_snapshots, args.keep_source)
if source_destroys:
verbose("Destroying old snapshots on source {0}:\n{1}".format(args.ssh_source, "\n".join(source_destroys)))
zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
try:
zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
except Exception as e:
failed(str(e))
target_destroys=determine_destroy_list(target_obsolete_snapshots, args.keep_target)
if target_destroys:
verbose("Destroying old snapshots on target {0}:\n{1}".format(args.ssh_target, "\n".join(target_destroys)))
zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
verbose("All done")
try:
zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
except Exception as e:
failed(str(e))
################################################################## ENTRY POINT
# parse arguments
import argparse
parser = argparse.ArgumentParser(description='ZFS autobackup v2.2')
parser = argparse.ArgumentParser(
description='ZFS autobackup v2.4',
epilog='When a filesystem fails, zfs_backup will continue and report the number of failures at that end. Also the exit code will indicate the number of failures.')
parser.add_argument('--ssh-source', default="local", help='Source host to get backup from. (user@hostname) Default %(default)s.')
parser.add_argument('--ssh-target', default="local", help='Target host to push backup to. (user@hostname) Default %(default)s.')
parser.add_argument('--keep-source', type=int, default=30, help='Number of days to keep old snapshots on source. Default %(default)s.')
parser.add_argument('--keep-target', type=int, default=30, help='Number of days to keep old snapshots on target. Default %(default)s.')
parser.add_argument('backup_name', help='Name of the backup (you should set the zfs property "autobackup:backup-name" to true on filesystems you want to backup')
parser.add_argument('target_fs', help='Target filesystem')
parser.add_argument('target_path', help='Target path')
parser.add_argument('--no-snapshot', action='store_true', help='dont create new snapshot (usefull for finishing uncompleted backups, or cleanups)')
parser.add_argument('--no-send', action='store_true', help='dont send snapshots (usefull to only do a cleanup)')
parser.add_argument('--allow-empty', action='store_true', help='if nothing has changed, still create empty snapshots.')
parser.add_argument('--ignore-replicated', action='store_true', help='Ignore datasets that seem to be replicated some other way. (No changes since lastest snapshot. Usefull for proxmox HA replication)')
parser.add_argument('--no-holds', action='store_true', help='Dont lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
parser.add_argument('--ignore-new', action='store_true', help='Ignore filesystem if there are already newer snapshots for it on the target (use with caution)')
parser.add_argument('--resume', action='store_true', help='support resuming of interrupted transfers by using the zfs extensible_dataset feature (both zpools should have it enabled) Disadvantage is that you need to use zfs recv -A if another snapshot is created on the target during a receive. Otherwise it will keep failing.')
parser.add_argument('--strip-path', default=0, type=int, help='number of directory to strip from path (use 1 when cloning zones between 2 SmartOS machines)')
parser.add_argument('--buffer', default="", help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G)')
parser.add_argument('--buffer', default="", help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G) Will also show nice progress output.')
parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
# parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
parser.add_argument('--clear-refreservation', action='store_true', help='Set refreservation property to none for new filesystems. Usefull when backupping SmartOS volumes. (recommended)')
parser.add_argument('--clear-mountpoint', action='store_true', help='Sets canmount=noauto property, to prevent the received filesystem from mounting over existing filesystems. (recommended)')
parser.add_argument('--filter-properties', action='append', help='Filter properties when receiving filesystems. Can be specified multiple times. (Example: If you send data from Linux to FreeNAS, you should filter xattr)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_fs to on)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_path to on)')
parser.add_argument('--ignore-transfer-errors', action='store_true', help='Ignore transfer errors (still checks if received filesystem exists. usefull for acltype errors)')
@ -731,11 +806,23 @@ parser.add_argument('--debug', action='store_true', help='debug output (shows co
#note args is the only global variable we use, since its a global readonly setting anyway
args = parser.parse_args()
if args.ignore_replicated and args.allow_empty:
abort("Cannot use allow_empty with ignore_replicated.")
try:
zfs_autobackup()
if not failures:
verbose("All operations completed succesfully.")
sys.exit(0)
else:
verbose("{} OPERATION(S) FAILED!".format(failures))
#exit with the number of failures.
sys.exit(min(255,failures))
except Exception as e:
if args.debug:
raise
else:
print("* ABORTED *")
print(str(e))
abort("FATAL ERROR")