Compare commits

...

56 Commits

Author SHA1 Message Date
17445ec54a Update README.md 2019-11-10 01:16:48 +01:00
07a150618a Update README.md 2019-11-10 01:01:08 +01:00
067f3b92d1 Update README.md 2019-11-10 00:51:20 +01:00
71a394cfc7 rollback 2019-10-19 14:54:10 +02:00
bfc36ac87f rollback 2019-10-19 14:53:31 +02:00
ad47b26f56 Revert "fixing quota issues"
This reverts commit d973905303.
2019-10-17 10:42:54 +02:00
f38da17592 v3.0: no longer replicate all properties by default. this made things unnecessary complicated. now use the --properties option to specify the properties you want. 2019-10-16 13:37:31 +02:00
d973905303 fixing quota issues 2019-10-16 12:51:12 +02:00
82465acd5b clearification of testmode 2019-10-16 10:27:35 +02:00
514131d67c update docs 2019-10-16 09:45:08 +02:00
dfcae1613b clearify the target path is a zfs filesystem, not an regular path 2019-10-16 09:43:28 +02:00
67b21b4015 bugfix: exitcode always was 255 2019-10-16 09:28:21 +02:00
3907c850a6 Update zfs_autobackup 2019-10-02 22:58:50 +02:00
3b9b96243b dont destroy stale snapshots if we're using --ignore-replicated 2019-10-02 19:39:31 +02:00
54235f455a zfs_autobackup 2.4: try to continue on non-fatal errors 2019-10-02 18:21:24 +02:00
c176b968a9 forgot to return exit code when not using debug mode :( 2019-03-26 23:06:03 +01:00
921f7df0a5 updated readme 2019-02-19 11:24:10 +01:00
edee598cf8 updated readme 2019-02-19 11:19:16 +01:00
80b3272f0f disable destroy-stale for now. updated readme 2019-02-19 11:09:54 +01:00
617e0fb69b fix/revisit stale filesystem detection 2019-02-19 11:04:59 +01:00
46a85fd170 updated readme 2019-02-19 10:26:52 +01:00
1f59229419 fixes 2019-02-19 01:28:22 +01:00
fcd98e2d87 much cleaner output and layout. removed useless error output. general cleanup. 2019-02-19 00:17:20 +01:00
dd8b2442ec options for proxmox HA: no-holds, ignore-new and ignore-replicated 2019-02-18 18:53:54 +01:00
291040eb2d obsolete, now that we have resume support in zfs 2019-02-16 22:28:15 +01:00
d12a132f3f added buffering 2019-02-16 21:51:30 +01:00
2255e0e691 add some performance options 2019-02-16 21:00:09 +01:00
6a481ed6a4 do not create snapshots for filesystems without changes. (usefull for proxmox cluster that use internal replication) 2019-02-16 20:30:44 +01:00
11d051122b now hold important snapshots to prevent accidental deletion by administrator or other scripts 2019-02-16 02:06:06 +01:00
511311eee7 nicer error output. you can also set autobackup:blah=child now, to only select childeren (recursively) of the dataset 2019-02-16 00:38:30 +01:00
fa405dce57 option to allow ignoring of transfer errors. this will still check if the filesystem was received. i used it to ignore a bunch of acltype property errors on smartos from proxmox. 2019-02-03 21:52:59 +01:00
bf37322aba Update README.md 2019-02-03 21:07:48 +01:00
a7bf1e8af8 version bump 2019-02-03 21:05:46 +01:00
352c61fd00 Merge branch 'master' of github.com:psy0rz/zfs_autobackup 2019-02-03 21:04:53 +01:00
0fe09ea535 use ~/.ssh/config for these options 2019-02-03 21:04:46 +01:00
64c9b84102 Update README.md 2019-02-03 21:00:39 +01:00
8a2e1d36d7 Update README.md 2019-02-03 20:59:16 +01:00
a120fbb85f Merge pull request #11 from psy0rz/revert-9-feature/ssh-mux
Revert "Feature/ssh mux"
2019-02-03 20:57:44 +01:00
42bbecc571 Revert "Feature/ssh mux" 2019-02-03 20:57:00 +01:00
b8d744869d doc 2019-02-03 20:56:45 +01:00
c253a17b75 info 2019-02-03 20:38:26 +01:00
d1fe00aee2 Merge branch 'master' of github.com:psy0rz/zfs_autobackup 2019-02-03 20:23:52 +01:00
85d2e1a635 Merge pull request #9 from mariusvw/feature/ssh-mux
Feature/ssh mux
2019-02-03 20:23:44 +01:00
9455918708 update docs 2018-06-07 14:08:59 +02:00
5316737388 update docs 2018-06-07 14:06:16 +02:00
c6afa33e62 option to filter properties on zfs recv 2018-06-07 14:01:24 +02:00
a8d0ff9f37 Restored stderr handing to previous state 2018-04-05 18:05:10 +02:00
cc1725e3be Compacted code bit into a function 2018-04-05 10:01:23 +02:00
42b71bbc74 Write STDERR to STDOUT 2018-04-05 10:00:53 +02:00
84d44a267a Exit persistent connection when everything is finished 2018-04-05 09:23:59 +02:00
ba89dc8bb2 Use persistent connections for 600 seconds (10min) 2018-04-05 09:23:39 +02:00
62178e424e Added argument to return exit code 2018-04-05 09:23:08 +02:00
b0ffdb4893 fix: atomic snapshots can only be created per pool. now uses a seperate zfs snapshot command for each pool 2018-03-06 15:54:00 +01:00
cc45122e3e Update zfs_autobackup
not needed module
2018-03-06 15:03:30 +01:00
e872d79677 conflicting name 2017-09-25 13:46:20 +02:00
e74e50d4e8 Update README.md 2017-09-20 10:28:34 +02:00
2 changed files with 543 additions and 238 deletions

186
README.md
View File

@ -1,13 +1,19 @@
# ZFS autobackup
(checkout v3.0-beta for the new cool stuff: https://github.com/psy0rz/zfs_autobackup/blob/v3/README.md)
Official releases: https://github.com/psy0rz/zfs_autobackup/releases
Introduction
============
ZFS autobackup is used to periodicly backup ZFS filesystems to other locations. This is done using the very effcient zfs send and receive commands.
It has the following features:
* Automaticly selects filesystems to backup by looking at a simple ZFS property.
* Creates consistent snapshots.
* Works across operating systems: Tested with Linux, FreeBSD/FreeNAS and SmartOS.
* Works in combination with existing replication systems. (Like Proxmox HA)
* Automatically selects filesystems to backup by looking at a simple ZFS property. (recursive)
* Creates consistent snapshots. (takes all snapshots at once, atomic.)
* Multiple backups modes:
* "push" local data to a backup-server via SSH.
* "pull" remote data from a server via SSH and backup it locally.
@ -16,32 +22,37 @@ It has the following features:
* Supports resuming of interrupted transfers. (via the zfs extensible_dataset feature)
* Backups and snapshots can be named to prevent conflicts. (multiple backups from and to the same filesystems are no problem)
* Always creates a new snapshot before starting.
* Checks everything and aborts on errors.
* Checks everything but tries continue on non-fatal errors when possible. (Reports error-count when done)
* Ability to 'finish' aborted backups to see what goes wrong.
* Easy to debug and has a test-mode. Actual unix commands are printed.
* Keeps latest X snapshots remote and locally. (default 30, configurable)
* Uses zfs-holds on important snapshots so they cant be accidentally destroyed.
* Easy installation:
* Only one host needs the zfs_autobackup script. The other host just needs ssh and the zfs command.
* Written in python and uses zfs-commands, no 3rd party dependencys or libraries.
* Written in python and uses zfs-commands, no 3rd party dependency's or libraries.
* No separate config files or properties. Just one command you can copy/paste in your backup script.
Usage
====
```
usage: zfs_autobackup [-h] [--ssh-source SSH_SOURCE] [--ssh-target SSH_TARGET]
[--ssh-cipher SSH_CIPHER] [--keep-source KEEP_SOURCE]
[--keep-target KEEP_TARGET] [--no-snapshot] [--no-send]
[--resume] [--strip-path STRIP_PATH] [--destroy-stale]
[--keep-source KEEP_SOURCE] [--keep-target KEEP_TARGET]
[--no-snapshot] [--no-send] [--allow-empty]
[--ignore-replicated] [--no-holds] [--ignore-new]
[--resume] [--strip-path STRIP_PATH] [--buffer BUFFER]
[--clear-refreservation] [--clear-mountpoint]
[--rollback] [--compress] [--test] [--verbose] [--debug]
backup_name target_fs
[--filter-properties FILTER_PROPERTIES] [--rollback]
[--ignore-transfer-errors] [--test] [--verbose]
[--debug]
backup_name target_path
ZFS autobackup v2.1
ZFS autobackup v2.4
positional arguments:
backup_name Name of the backup (you should set the zfs property
"autobackup:backup-name" to true on filesystems you
want to backup
target_fs Target filesystem
target_path Target path
optional arguments:
-h, --help show this help message and exit
@ -51,8 +62,6 @@ optional arguments:
--ssh-target SSH_TARGET
Target host to push backup to. (user@hostname) Default
local.
--ssh-cipher SSH_CIPHER
SSH cipher to use (default None)
--keep-source KEEP_SOURCE
Number of days to keep old snapshots on source.
Default 30.
@ -62,14 +71,26 @@ optional arguments:
--no-snapshot dont create new snapshot (usefull for finishing
uncompleted backups, or cleanups)
--no-send dont send snapshots (usefull to only do a cleanup)
--allow-empty if nothing has changed, still create empty snapshots.
--ignore-replicated Ignore datasets that seem to be replicated some other
way. (No changes since lastest snapshot. Usefull for
proxmox HA replication)
--no-holds Dont lock snapshots on the source. (Usefull to allow
proxmox HA replication to switches nodes)
--ignore-new Ignore filesystem if there are already newer snapshots
for it on the target (use with caution)
--resume support resuming of interrupted transfers by using the
zfs extensible_dataset feature (both zpools should
have it enabled)
have it enabled) Disadvantage is that you need to use
zfs recv -A if another snapshot is created on the
target during a receive. Otherwise it will keep
failing.
--strip-path STRIP_PATH
number of directory to strip from path (use 1 when
cloning zones between 2 SmartOS machines)
--destroy-stale Destroy stale backups that have no more snapshots. Be
sure to verify the output before using this!
--buffer BUFFER Use mbuffer with specified size to speedup zfs
transfer. (e.g. --buffer 1G) Will also show nice
progress output.
--clear-refreservation
Set refreservation property to none for new
filesystems. Usefull when backupping SmartOS volumes.
@ -77,14 +98,23 @@ optional arguments:
--clear-mountpoint Sets canmount=noauto property, to prevent the received
filesystem from mounting over existing filesystems.
(recommended)
--filter-properties FILTER_PROPERTIES
Filter properties when receiving filesystems. Can be
specified multiple times. (Example: If you send data
from Linux to FreeNAS, you should filter xattr)
--rollback Rollback changes on the target before starting a
backup. (normally you can prevent changes by setting
the readonly property on the target_fs to on)
--compress use compression during zfs send/recv
the readonly property on the target_path to on)
--ignore-transfer-errors
Ignore transfer errors (still checks if received
filesystem exists. usefull for acltype errors)
--test dont change anything, just show what would be done
(still does all read-only operations)
--verbose verbose output
--debug debug output (shows commands that are executed)
When a filesystem fails, zfs_backup will continue and report the number of
failures at that end. Also the exit code will indicate the number of failures.
```
Backup example
@ -131,7 +161,7 @@ First install the ssh-key on the server that you specify with --ssh-source or --
Method 1: Run the script on the backup server and pull the data from the server specfied by --ssh-source. This is usually the preferred way and prevents a hacked server from accesing the backup-data:
```
root@fs1:/home/psy# ./zfs_autobackup --ssh-source root@1.2.3.4 smartos01_fs1 fs1/zones/backup/zfsbackups/smartos01.server.com --verbose --compress
root@fs1:/home/psy# ./zfs_autobackup --ssh-source root@1.2.3.4 smartos01_fs1 fs1/zones/backup/zfsbackups/smartos01.server.com --verbose
Getting selected source filesystems for backup smartos01_fs1 on root@1.2.3.4
Selected: zones (direct selection)
Selected: zones/1eb33958-72c1-11e4-af42-ff0790f603dd (inherited selection)
@ -160,12 +190,63 @@ All done
```
Tips
----
====
* Set the ```readonly``` property of the target filesystem to ```on```. This prevents changes on the target side. If there are changes the next backup will fail and will require a zfs rollback. (by using the --rollback option for example)
* Use ```--clear-refreservation``` to save space on your backup server.
* Use ```--clear-mountpoint``` to prevent the target server from mounting the backupped filesystem in the wrong place during a reboot. If this happens on systems like SmartOS or Openindia, svc://filesystem/local wont be able to mount some stuff and you need to resolve these issues on the console.
Speeding up SSH and prevent connection flooding
-----------------------------------------------
Add this to your ~/.ssh/config:
```
Host *
ControlPath ~/.ssh/control-master-%r@%h:%p
ControlMaster auto
ControlPersist 3600
```
This will make all your ssh connections persistent and greatly speed up zfs_autobackup for jobs with short intervals.
Thanks @mariusvw :)
Specifying ssh port or options
------------------------------
The correct way to do this is by creating ~/.ssh/config:
```
Host smartos04
Hostname 1.2.3.4
Port 1234
user root
Compression yes
```
This way you can just specify "smartos04" as host.
Also uses compression on slow links.
Look in man ssh_config for many more options.
Troubleshooting
===============
`cannot receive incremental stream: invalid backup stream`
This usually means you've created a new snapshot on the target side during a backup.
* Solution 1: Restart zfs_autobackup and make sure you dont use --resume. If you did use --resume, be sure to "abort" the recveive on the target side with zfs recv -A.
* Solution 2: Destroy the newly created snapshot and restart zfs_autobackup.
`internal error: Invalid argument`
In some cases (Linux -> FreeBSD) this means certain properties are not fully supported on the target system.
Try using something like: --filter-properties xattr
Restore example
===============
@ -178,19 +259,6 @@ root@fs1:/home/psy# zfs send fs1/zones/backup/zfsbackups/smartos01.server.com/z
After that you can rename the disk image from the temporary location to the location of a new SmartOS machine you've created.
Snapshotting example
====================
Sending huge snapshots cant be resumed when a connection is interrupted: Next time zfs_autobackup is started, the whole snapshot will be transferred again. For this reason you might want to have multiple small snapshots.
The --no-send option can be usefull for this. This way you can already create small snapshots every few hours:
````
[root@smartos2 ~]# zfs_autobackup --ssh-source root@smartos1 smartos1_freenas1 zones --verbose --ssh-cipher chacha20-poly1305@openssh.com --no-send
````
Later when our freenas1 server is ready we can use the same command without the --no-send at freenas1. At that point the server will receive all the small snapshots up to that point.
Monitoring with Zabbix-jobs
===========================
@ -203,3 +271,55 @@ zabbix-job-status backup_smartos01_fs1 daily $?
```
This will update the zabbix server with the exitcode and will also alert you if the job didnt run for more than 2 days.
Backuping up a proxmox cluster with HA replication
==================================================
Due to the nature of proxmox we had to make a few enhancements to zfs_autobackup. This will probably also benefit other systems that use their own replication in combination with zfs_autobackup.
All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs_autobackup will be fine. (and it will get the next backup from the new node automaticly)
In the example below we have 3 nodes, named h4, h5 and h6.
The backup will go to a machine named smartos03.
Preparing the proxmox nodes
---------------------------
On each node select the filesystems as following:
```
root@h4:~# zfs set autobackup:h4_smartos03=true rpool
root@h4:~# zfs set autobackup:h4_smartos03=false rpool/data
root@h4:~# zfs set autobackup:data_smartos03=child rpool/data
```
* rpool will be backuped the usual way, and is named h4_smartos03. (each node will have a unique name)
* rpool/data will be excluded from the usual backup
* The CHILDREN of rpool/data be selected for a cluster wide backup named data_smartos03. (each node uses the same backup name)
Preparing the backup server
---------------------------
Extra options needed for proxmox with HA:
* --no-holds: To allow proxmox to destroy our snapshots if a VM migrates to another node.
* --ignore-replicated: To ignore the replicated filesystems of proxmox on the receiving proxmox nodes. (e.g: only backup from the node where the VM is active)
I use the following backup script on the backup server:
```
for H in h4 h5 h6; do
echo "################################### DATA $H"
#backup data filesystems to a common place
./zfs_autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --no-holds $@
zabbix-job-status backup_$H""_data_smartos03 daily $? >/dev/null 2>/dev/null
echo "################################### RPOOL $H"
#backup rpool to own place
./zfs_autobackup --ssh-source root@$H $H""_smartos03 zones/backup/zfsbackups/$H --verbose --clear-refreservation --clear-mountpoint --resume --ignore-transfer-errors $@
zabbix-job-status backup_$H""_smartos03 daily $? >/dev/null 2>/dev/null
done
```

View File

@ -1,8 +1,8 @@
#!/usr/bin/env python
#!/usr/bin/env python2
# -*- coding: utf8 -*-
#(C)edwin@datux.nl -- Edwin Eefting
#Release under GPL.
from __future__ import print_function
import os
@ -11,25 +11,26 @@ import re
import traceback
import subprocess
import pprint
import cStringIO
import time
def error(txt):
print(txt, file=sys.stderr)
def verbose(txt):
if args.verbose:
print(txt)
def debug(txt):
if args.debug:
print(txt)
#fatal abort execution, exit code 255
def abort(txt):
error(txt)
sys.exit(255)
"""run a command. specifiy ssh user@host to run remotely"""
def run(cmd, input=None, ssh_to="local", tab_split=False, valid_exitcodes=[ 0 ], test=False):
@ -40,10 +41,6 @@ def run(cmd, input=None, ssh_to="local", tab_split=False, valid_exitcodes=[ 0 ],
#use ssh?
if ssh_to != "local":
encoded_cmd.extend(["ssh", ssh_to])
if args.ssh_cipher:
encoded_cmd.extend(["-c", args.ssh_cipher])
if args.compress:
encoded_cmd.append("-C")
#make sure the command gets all the data in utf8 format:
@ -104,22 +101,24 @@ def zfs_get_selected_filesystems(ssh_to, backup_name):
for source_filesystem in source_filesystems:
(name,value,source)=source_filesystem
if value=="false":
verbose("Ignoring: {0} (disabled)".format(name))
verbose("* Ignored : {0} (disabled)".format(name))
else:
if source=="local":
selected_filesystems.append(name)
if source=="local" and ( value=="true" or value=="child"):
direct_filesystems.append(name)
verbose("Selected: {0} (direct selection)".format(name))
elif source.find("inherited from ")==0:
if source=="local" and value=="true":
selected_filesystems.append(name)
verbose("* Selected: {0} (direct selection)".format(name))
elif source.find("inherited from ")==0 and (value=="true" or value=="child"):
inherited_from=re.sub("^inherited from ", "", source)
if inherited_from in direct_filesystems:
selected_filesystems.append(name)
verbose("Selected: {0} (inherited selection)".format(name))
verbose("* Selected: {0} (inherited selection)".format(name))
else:
verbose("Ignored: {0} (already a backup)".format(name))
verbose("* Ignored : {0} (already a backup)".format(name))
else:
vebose("Ignored: {0} ({0})".format(source))
verbose("* Ignored : {0} (only childs)".format(name))
return(selected_filesystems)
@ -130,7 +129,6 @@ def zfs_get_resumable_filesystems(ssh_to, filesystems):
cmd=[ "zfs", "get", "-t", "volume,filesystem", "-o", "name,value", "-H", "receive_resume_token" ]
cmd.extend(filesystems)
#TODO: get rid of ugly errors for non-existing target filesystems
resumable_filesystems=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes= [ 0,1 ] )
ret={}
@ -166,21 +164,30 @@ test_snapshots={}
"""create snapshot on multiple filesystems at once (atomicly)"""
"""create snapshot on multiple filesystems at once (atomicly per pool)"""
def zfs_create_snapshot(ssh_to, filesystems, snapshot):
cmd=[ "zfs", "snapshot" ]
#collect per pool, zfs can only take atomic snapshots per pool
pools={}
for filesystem in filesystems:
pool=filesystem.split('/')[0]
if pool not in pools:
pools[pool]=[]
pools[pool].append(filesystem)
for pool in pools:
cmd=[ "zfs", "snapshot" ]
for filesystem in pools[pool]:
cmd.append(filesystem+"@"+snapshot)
#in testmode we dont actually make changes, so keep them in a list to simulate
if args.test:
if not ssh_to in test_snapshots:
test_snapshots[ssh_to]={}
if not filesystem in test_snapshots[ssh_to]:
test_snapshots[ssh_to][filesystem]=[]
test_snapshots[ssh_to][filesystem].append(snapshot)
# if args.test:
# if not ssh_to in test_snapshots:
# test_snapshots[ssh_to]={}
# if not filesystem in test_snapshots[ssh_to]:
# test_snapshots[ssh_to][filesystem]=[]
# test_snapshots[ssh_to][filesystem].append(snapshot)
run(ssh_to=ssh_to, tab_split=False, cmd=cmd, test=args.test)
@ -194,13 +201,12 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
ret={}
if filesystems:
#TODO: get rid of ugly errors for non-existing target filesystems
cmd=[
"zfs", "list", "-d", "1", "-r", "-t" ,"snapshot", "-H", "-o", "name"
]
cmd.extend(filesystems)
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0,1 ])
snapshots=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
for snapshot in snapshots:
@ -211,23 +217,46 @@ def zfs_get_snapshots(ssh_to, filesystems, backup_name):
ret[filesystem].append(snapshot_name)
#also add any test-snapshots that where created with --test mode
if args.test:
if ssh_to in test_snapshots:
for filesystem in filesystems:
if filesystem in test_snapshots[ssh_to]:
if not filesystem in ret:
ret[filesystem]=[]
ret[filesystem].extend(test_snapshots[ssh_to][filesystem])
# if args.test:
# if ssh_to in test_snapshots:
# for filesystem in filesystems:
# if filesystem in test_snapshots[ssh_to]:
# if not filesystem in ret:
# ret[filesystem]=[]
# ret[filesystem].extend(test_snapshots[ssh_to][filesystem])
return(ret)
def default_tag():
return("zfs_autobackup:"+args.backup_name)
"""hold a snapshot so it cant be destroyed accidently by admin or other processes"""
def zfs_hold_snapshot(ssh_to, snapshot, tag=None):
cmd=[
"zfs", "hold", tag or default_tag(), snapshot
]
run(ssh_to=ssh_to, test=args.test, tab_split=False, cmd=cmd, valid_exitcodes=[ 0, 1 ])
"""release a snapshot"""
def zfs_release_snapshot(ssh_to, snapshot, tag=None):
cmd=[
"zfs", "release", tag or default_tag(), snapshot
]
run(ssh_to=ssh_to, test=args.test, tab_split=False, cmd=cmd, valid_exitcodes=[ 0, 1 ])
"""transfer a zfs snapshot from source to target. both can be either local or via ssh.
TODO:
(parially implemented, local buffer is a bit more annoying to do)
buffering: specify buffer_size to use mbuffer (or alike) to apply buffering where neccesary
local to local:
@ -240,7 +269,6 @@ remote send -> remote buffer -> ssh -> local buffer -> local receive
remote to remote:
remote send -> remote buffer -> ssh -> local buffer -> ssh -> remote buffer -> remote receive
TODO: can we string together all the zfs sends and recvs, so that we only need to use 1 ssh connection? should be faster if there are many small snaphots
@ -253,26 +281,31 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
if ssh_source != "local":
source_cmd.extend([ "ssh", ssh_source ])
if args.ssh_cipher:
source_cmd.extend(["-c", args.ssh_cipher])
if args.compress:
source_cmd.append("-C")
source_cmd.extend(["zfs", "send", ])
#all kind of performance options:
source_cmd.append("-L") # large block support
source_cmd.append("-e") # WRITE_EMBEDDED, more compact stream
source_cmd.append("-c") # use compressed WRITE records
if not args.resume:
source_cmd.append("-D") # dedupped stream, sends less duplicate data
#only verbose in debug mode, lots of output
if args.debug :
source_cmd.append("-v")
if not first_snapshot:
txt="Initial transfer of "+source_filesystem+" snapshot "+second_snapshot
txt=">>> Transfer: "+source_filesystem+"@"+second_snapshot
else:
txt="Incremental transfer of "+source_filesystem+" between snapshots "+first_snapshot+"..."+second_snapshot
txt=">>> Transfer: "+source_filesystem+"@"+first_snapshot+"...@"+second_snapshot
if resume_token:
source_cmd.extend([ "-t", resume_token ])
verbose("RESUMING "+txt)
txt=txt+" [RESUMED]"
else:
source_cmd.append("-p")
@ -287,20 +320,24 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
verbose(txt)
if args.buffer and args.ssh_source!="local":
source_cmd.append("|mbuffer -m {}".format(args.buffer))
#### build target command
target_cmd=[]
if ssh_target != "local":
target_cmd.extend([ "ssh", ssh_target ])
if args.ssh_cipher:
target_cmd.extend(["-c", args.ssh_cipher])
if args.compress:
target_cmd.append("-C")
target_cmd.extend(["zfs", "recv", "-u" ])
#also verbose in --verbose mode so we can see the transfer speed when its completed
if args.verbose or args.debug:
# filter certain properties on receive (usefull for linux->freebsd in some cases)
if args.filter_properties:
for filter_property in args.filter_properties:
target_cmd.extend([ "-x" , filter_property ])
if args.debug:
target_cmd.append("-v")
if args.resume:
@ -312,6 +349,8 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
else:
target_cmd.append(target_filesystem)
if args.buffer and args.ssh_target!="local":
target_cmd.append("|mbuffer -m {}".format(args.buffer))
#### make sure parent on target exists
@ -332,6 +371,7 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
source_proc.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits.
target_proc.communicate()
if not args.ignore_transfer_errors:
if source_proc.returncode:
raise(subprocess.CalledProcessError(source_proc.returncode, source_cmd))
@ -344,32 +384,41 @@ def zfs_transfer(ssh_source, source_filesystem, first_snapshot, second_snapshot,
"""get filesystems that where already backupped to a target. """
def zfs_get_backupped_filesystems(ssh_to, backup_name, target_fs):
#get all target filesystems that have received or inherited the backup propert, under the target_fs tree
ret=run(ssh_to=ssh_to, tab_split=False, cmd=[
"zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_fs
#NOTE: unreliable when using with autobackup:bla=child
# """get filesystems that where already backupped to a target. """
# def zfs_get_backupped_filesystems(ssh_to, backup_name, target_path):
# #get all target filesystems that have received or inherited the backup propert, under the target_path tree
# ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
# "zfs", "get", "-r", "-t", "volume,filesystem", "-o", "name", "-s", "received,inherited", "-H", "autobackup:"+backup_name, target_path
# ])
#
# return(ret)
"""get existing filesystems """
def zfs_get_existing_filesystems(ssh_to, target_path):
#get all target filesystems that have received or inherited the backup propert, under the target_path tree
ret=run(ssh_to=ssh_to, tab_split=False, valid_exitcodes=[ 0,1 ], cmd=[
"zfs", "list", "-r", "-t", "volume,filesystem", "-o", "name", "-H", target_path
])
return(ret)
"""get filesystems that where once backupped to target but are no longer selected on source
these are filesystems that are not in the list in target_filesystems.
this happens when filesystems are destroyed or unselected on the source.
"""
def get_stale_backupped_filesystems(ssh_to, backup_name, target_fs, target_filesystems):
def get_stale_backupped_filesystems(backup_name, target_path, target_filesystems, existing_target_filesystems):
backupped_filesystems=zfs_get_backupped_filesystems(ssh_to=ssh_to, backup_name=backup_name, target_fs=target_fs)
#determine backupped filesystems that are not in target_filesystems anymore
stale_backupped_filesystems=[]
for backupped_filesystem in backupped_filesystems:
if backupped_filesystem not in target_filesystems:
stale_backupped_filesystems.append(backupped_filesystem)
for existing_target_filesystem in existing_target_filesystems:
if existing_target_filesystem not in target_filesystems:
stale_backupped_filesystems.append(existing_target_filesystem)
return(stale_backupped_filesystems)
@ -397,11 +446,50 @@ def lstrip_path(path, count):
return("/".join(path.split("/")[count:]))
"""get list of filesystems that are changed, compared to specified latest snapshot. """
def zfs_get_unchanged_snapshots(ssh_to, snapshots):
ret=[]
for ( filesystem, snapshot_list ) in snapshots.items():
latest_snapshot=snapshot_list[-1]
cmd=[ "zfs", "get","-H" ,"-ovalue", "written@"+latest_snapshot, filesystem ]
output=run(ssh_to=ssh_to, tab_split=False, cmd=cmd, valid_exitcodes=[ 0 ])
if output[0]=="0B" or output[0]=="0":
ret.append(filesystem)
return(ret)
"""get filesytems that are have changed since any snapshot."""
def zfs_get_unchanged_filesystems(ssh_to, filesystems):
ret=[]
cmd=[ "zfs", "get","-H" ,"-oname,value", "written" ]
cmd.extend(filesystems)
output=run(ssh_to=ssh_to, tab_split=True, cmd=cmd, valid_exitcodes=[ 0 ])
for ( filesystem , written ) in output:
if written=="0B" or written=="0":
ret.append(filesystem)
return(ret)
#fugly..
failures=0
#something failed, but we try to continue with the rest
def failed(txt):
global failures
failures=failures+1
error("FAILURE: "+txt+"\n")
def zfs_autobackup():
############## data gathering section
if args.test:
@ -411,55 +499,96 @@ def zfs_autobackup():
### getting and determinging source/target filesystems
# get selected filesystem on backup source
# get selected filesystems on backup source
verbose("Getting selected source filesystems for backup {0} on {1}".format(args.backup_name,args.ssh_source))
source_filesystems=zfs_get_selected_filesystems(args.ssh_source, args.backup_name)
#nothing todo
if not source_filesystems:
error("No filesystems source selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
sys.exit(1)
abort("No source filesystems selected, please do a 'zfs set autobackup:{0}=true' on {1}".format(args.backup_name,args.ssh_source))
if args.ignore_replicated:
replicated_filesystems=zfs_get_unchanged_filesystems(args.ssh_source, source_filesystems)
for replicated_filesystem in replicated_filesystems:
if replicated_filesystem in source_filesystems:
source_filesystems.remove(replicated_filesystem)
verbose("* Already replicated: {}".format(replicated_filesystem))
if not source_filesystems:
verbose("Nothing to do, all filesystems are already replicated.")
sys.exit(0)
# determine target filesystems
target_filesystems=[]
for source_filesystem in source_filesystems:
#append args.target_fs prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path))
#append args.target_path prefix and strip args.strip_path paths from source_filesystem
target_filesystems.append(args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path))
debug("Wanted target filesystems:\n"+str(pprint.pformat(target_filesystems)))
# get actual existing target filesystems. (including ones that might not be in the backupset anymore)
verbose("Getting existing target filesystems")
existing_target_filesystems=zfs_get_existing_filesystems(ssh_to=args.ssh_target, target_path=args.target_path)
debug("Existing target filesystems:\n"+str(pprint.pformat(existing_target_filesystems)))
common_target_filesystems=list(set(target_filesystems) & set(existing_target_filesystems))
debug("Common target filesystems (target filesystems that also exist on source):\n"+str(pprint.pformat(common_target_filesystems)))
### creating snapshots
# this is one of the first things we do, so that in case of failures we still have snapshots.
#create new snapshot?
if not args.no_snapshot:
new_snapshot_name=args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
verbose("Creating source snapshot {0} on {1} ".format(new_snapshot_name, args.ssh_source))
zfs_create_snapshot(args.ssh_source, source_filesystems, new_snapshot_name)
### get resumable transfers
### get resumable transfers from target
resumable_target_filesystems={}
if args.resume:
verbose("Checking for aborted transfers that can be resumed")
#Important: use target_filesystem, not existing_target_filesystems (during initial transfer its resumable but doesnt exsit yet)
resumable_target_filesystems=zfs_get_resumable_filesystems(args.ssh_target, target_filesystems)
debug("Resumable filesystems: "+str(pprint.pformat(resumable_target_filesystems)))
debug("Resumable filesystems:\n"+str(pprint.pformat(resumable_target_filesystems)))
### get all snapshots of all selected filesystems on both source and target
### get existing target snapshots
target_snapshots={}
if common_target_filesystems:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, common_target_filesystems, args.backup_name)
# except subprocess.CalledProcessError:
# verbose("(ignoring errors, probably initial backup for this filesystem)")
# pass
debug("Target snapshots:\n" + str(pprint.pformat(target_snapshots)))
### get eixsting source snapshots
verbose("Getting source snapshot-list from {0}".format(args.ssh_source))
source_snapshots=zfs_get_snapshots(args.ssh_source, source_filesystems, args.backup_name)
debug("Source snapshots: " + str(pprint.pformat(source_snapshots)))
debug("Source snapshots:\n" + str(pprint.pformat(source_snapshots)))
### create new snapshots on source
if not args.no_snapshot:
#determine which filesystems changed since last snapshot
if not args.allow_empty and not args.ignore_replicated:
#determine which filesystemn are unchanged since OUR snapshots. (not since ANY snapshot)
unchanged_filesystems=zfs_get_unchanged_snapshots(args.ssh_source, source_snapshots)
else:
unchanged_filesystems=[]
snapshot_filesystems=[]
for source_filesystem in source_filesystems:
if source_filesystem not in unchanged_filesystems:
snapshot_filesystems.append(source_filesystem)
else:
verbose("* Not snapshotting {}, no changes found.".format(source_filesystem))
#create snapshots
if snapshot_filesystems:
new_snapshot_name=args.backup_name+"-"+time.strftime("%Y%m%d%H%M%S")
verbose("Creating source snapshots {0} on {1} ".format(new_snapshot_name, args.ssh_source))
zfs_create_snapshot(args.ssh_source, snapshot_filesystems, new_snapshot_name)
else:
verbose("No changes at all, not creating snapshot.")
#add it to the list of source filesystems
for snapshot_filesystem in snapshot_filesystems:
source_snapshots.setdefault(snapshot_filesystem,[]).append(new_snapshot_name)
target_snapshots={}
try:
verbose("Getting target snapshot-list from {0}".format(args.ssh_target))
target_snapshots=zfs_get_snapshots(args.ssh_target, target_filesystems, args.backup_name)
except subprocess.CalledProcessError:
verbose("(ignoring errors, probably initial backup for this filesystem)")
pass
debug("Target snapshots: " + str(pprint.pformat(target_snapshots)))
#obsolete snapshots that may be removed
@ -472,11 +601,12 @@ def zfs_autobackup():
#determine which snapshots to send for each filesystem
for source_filesystem in source_filesystems:
target_filesystem=args.target_fs + "/" + lstrip_path(source_filesystem, args.strip_path)
try:
target_filesystem=args.target_path + "/" + lstrip_path(source_filesystem, args.strip_path)
if source_filesystem not in source_snapshots:
#this happens if you use --no-snapshot and there are new filesystems without snapshots
verbose("Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
verbose("* Skipping source filesystem {0}, no snapshots found".format(source_filesystem))
else:
#incremental or initial send?
@ -488,13 +618,23 @@ def zfs_autobackup():
if latest_target_snapshot not in source_snapshots[source_filesystem]:
#cant find latest target anymore. find first common snapshot and inform user
error="Cant find latest target snapshot on source, did you destroy it accidently? "+source_filesystem+"@"+latest_target_snapshot
error_msg="Cant find latest target snapshot on source for '{}', did you destroy/rename it?".format(source_filesystem)
error_msg=error_msg+"\nLatest on target : "+target_filesystem+"@"+latest_target_snapshot
error_msg=error_msg+"\nMissing on source: "+source_filesystem+"@"+latest_target_snapshot
found=False
for latest_target_snapshot in reversed(target_snapshots[target_filesystem]):
if latest_target_snapshot in source_snapshots[source_filesystem]:
error=error+"\nYou could solve this by rolling back to: "+target_filesystem+"@"+latest_target_snapshot;
error_msg=error_msg+"\nYou could solve this by rolling back to this common snapshot on target: "+target_filesystem+"@"+latest_target_snapshot
found=True
break
if not found:
error_msg=error_msg+"\nAlso could not find an earlier common snapshot to rollback to."
else:
if args.ignore_new:
verbose("* Skipping source filesystem '{0}', target already has newer snapshots.".format(source_filesystem))
continue
raise(Exception(error))
raise(Exception(error_msg))
#send all new source snapshots that come AFTER the last target snapshot
latest_source_index=source_snapshots[source_filesystem].index(latest_target_snapshot)
@ -530,6 +670,10 @@ def zfs_autobackup():
else:
resume_token=None
#hold the snapshot we're sending on the source
if not args.no_holds:
zfs_hold_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+send_snapshot)
zfs_transfer(
ssh_source=args.ssh_source, source_filesystem=source_filesystem,
first_snapshot=latest_target_snapshot, second_snapshot=send_snapshot,
@ -537,11 +681,18 @@ def zfs_autobackup():
resume_token=resume_token
)
#hold the snapshot we just send to the target
zfs_hold_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+send_snapshot)
#now that we succesfully transferred this snapshot, the previous snapshot is obsolete:
if latest_target_snapshot:
zfs_release_snapshot(ssh_to=args.ssh_target, snapshot=target_filesystem+"@"+latest_target_snapshot)
target_obsolete_snapshots[target_filesystem].append(latest_target_snapshot)
if not args.no_holds:
zfs_release_snapshot(ssh_to=args.ssh_source, snapshot=source_filesystem+"@"+latest_target_snapshot)
source_obsolete_snapshots[source_filesystem].append(latest_target_snapshot)
#we just received a new filesytem?
else:
@ -558,16 +709,19 @@ def zfs_autobackup():
latest_target_snapshot=send_snapshot
# failed, skip this source_filesystem
except Exception as e:
failed(str(e))
############## cleanup section
#we only do cleanups after everything is complete, to keep everything consistent (same snapshots everywhere)
if not args.ignore_replicated:
#find stale backups on target that have become obsolete
verbose("Getting stale filesystems and snapshots from {0}".format(args.ssh_target))
stale_target_filesystems=get_stale_backupped_filesystems(ssh_to=args.ssh_target, backup_name=args.backup_name, target_fs=args.target_fs, target_filesystems=target_filesystems)
stale_target_filesystems=get_stale_backupped_filesystems(backup_name=args.backup_name, target_path=args.target_path, target_filesystems=target_filesystems, existing_target_filesystems=existing_target_filesystems)
debug("Stale target filesystems: {0}".format("\n".join(stale_target_filesystems)))
stale_target_snapshots=zfs_get_snapshots(args.ssh_target, stale_target_filesystems, args.backup_name)
@ -575,63 +729,76 @@ def zfs_autobackup():
target_obsolete_snapshots.update(stale_target_snapshots)
#determine stale filesystems that have no snapshots left (the can be destroyed)
#TODO: prevent destroying filesystems that have underlying filesystems that are still active.
stale_target_destroys=[]
for stale_target_filesystem in stale_target_filesystems:
if stale_target_filesystem not in stale_target_snapshots:
stale_target_destroys.append(stale_target_filesystem)
if stale_target_destroys:
if args.destroy_stale:
verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
#NOTE: dont destroy automaticly..not safe enough.
# if args.destroy_stale:
# verbose("Destroying stale filesystems on target {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
# zfs_destroy(ssh_to=args.ssh_target, filesystems=stale_target_destroys, recursive=True)
# else:
verbose("Stale filesystems on {0}:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
else:
verbose("Stale filesystems on {0}, use --destroy-stale to destroy:\n{1}".format(args.ssh_target, "\n".join(stale_target_destroys)))
verbose("NOTE: Cant determine stale target filesystems while using ignore_replicated.")
#now actually destroy the old snapshots
source_destroys=determine_destroy_list(source_obsolete_snapshots, args.keep_source)
if source_destroys:
verbose("Destroying old snapshots on source {0}:\n{1}".format(args.ssh_source, "\n".join(source_destroys)))
try:
zfs_destroy_snapshots(ssh_to=args.ssh_source, snapshots=source_destroys)
except Exception as e:
failed(str(e))
target_destroys=determine_destroy_list(target_obsolete_snapshots, args.keep_target)
if target_destroys:
verbose("Destroying old snapshots on target {0}:\n{1}".format(args.ssh_target, "\n".join(target_destroys)))
try:
zfs_destroy_snapshots(ssh_to=args.ssh_target, snapshots=target_destroys)
verbose("All done")
except Exception as e:
failed(str(e))
################################################################## ENTRY POINT
# parse arguments
import argparse
parser = argparse.ArgumentParser(description='ZFS autobackup v2.1')
parser = argparse.ArgumentParser(
description='ZFS autobackup v2.4',
epilog='When a filesystem fails, zfs_backup will continue and report the number of failures at that end. Also the exit code will indicate the number of failures.')
parser.add_argument('--ssh-source', default="local", help='Source host to get backup from. (user@hostname) Default %(default)s.')
parser.add_argument('--ssh-target', default="local", help='Target host to push backup to. (user@hostname) Default %(default)s.')
parser.add_argument('--ssh-cipher', default=None, help='SSH cipher to use (default %(default)s)')
parser.add_argument('--keep-source', type=int, default=30, help='Number of days to keep old snapshots on source. Default %(default)s.')
parser.add_argument('--keep-target', type=int, default=30, help='Number of days to keep old snapshots on target. Default %(default)s.')
parser.add_argument('backup_name', help='Name of the backup (you should set the zfs property "autobackup:backup-name" to true on filesystems you want to backup')
parser.add_argument('target_fs', help='Target filesystem')
parser.add_argument('target_path', help='Target path')
parser.add_argument('--no-snapshot', action='store_true', help='dont create new snapshot (usefull for finishing uncompleted backups, or cleanups)')
parser.add_argument('--no-send', action='store_true', help='dont send snapshots (usefull to only do a cleanup)')
parser.add_argument('--resume', action='store_true', help='support resuming of interrupted transfers by using the zfs extensible_dataset feature (both zpools should have it enabled)')
parser.add_argument('--allow-empty', action='store_true', help='if nothing has changed, still create empty snapshots.')
parser.add_argument('--ignore-replicated', action='store_true', help='Ignore datasets that seem to be replicated some other way. (No changes since lastest snapshot. Usefull for proxmox HA replication)')
parser.add_argument('--no-holds', action='store_true', help='Dont lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
parser.add_argument('--ignore-new', action='store_true', help='Ignore filesystem if there are already newer snapshots for it on the target (use with caution)')
parser.add_argument('--resume', action='store_true', help='support resuming of interrupted transfers by using the zfs extensible_dataset feature (both zpools should have it enabled) Disadvantage is that you need to use zfs recv -A if another snapshot is created on the target during a receive. Otherwise it will keep failing.')
parser.add_argument('--strip-path', default=0, type=int, help='number of directory to strip from path (use 1 when cloning zones between 2 SmartOS machines)')
parser.add_argument('--buffer', default="", help='Use mbuffer with specified size to speedup zfs transfer. (e.g. --buffer 1G) Will also show nice progress output.')
parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
# parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
parser.add_argument('--clear-refreservation', action='store_true', help='Set refreservation property to none for new filesystems. Usefull when backupping SmartOS volumes. (recommended)')
parser.add_argument('--clear-mountpoint', action='store_true', help='Sets canmount=noauto property, to prevent the received filesystem from mounting over existing filesystems. (recommended)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_fs to on)')
parser.add_argument('--filter-properties', action='append', help='Filter properties when receiving filesystems. Can be specified multiple times. (Example: If you send data from Linux to FreeNAS, you should filter xattr)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes on the target before starting a backup. (normally you can prevent changes by setting the readonly property on the target_path to on)')
parser.add_argument('--ignore-transfer-errors', action='store_true', help='Ignore transfer errors (still checks if received filesystem exists. usefull for acltype errors)')
parser.add_argument('--compress', action='store_true', help='use compression during zfs send/recv')
parser.add_argument('--test', action='store_true', help='dont change anything, just show what would be done (still does all read-only operations)')
parser.add_argument('--verbose', action='store_true', help='verbose output')
parser.add_argument('--debug', action='store_true', help='debug output (shows commands that are executed)')
@ -639,5 +806,23 @@ parser.add_argument('--debug', action='store_true', help='debug output (shows co
#note args is the only global variable we use, since its a global readonly setting anyway
args = parser.parse_args()
if args.ignore_replicated and args.allow_empty:
abort("Cannot use allow_empty with ignore_replicated.")
try:
zfs_autobackup()
if not failures:
verbose("All operations completed succesfully.")
sys.exit(0)
else:
verbose("{} OPERATION(S) FAILED!".format(failures))
#exit with the number of failures.
sys.exit(min(255,failures))
except Exception as e:
if args.debug:
raise
else:
print(str(e))
abort("FATAL ERROR")