Compare commits

...

5 Commits

4 changed files with 134 additions and 55 deletions

129
README.md
View File

@ -4,18 +4,18 @@
* Complete rewrite, cleaner object oriented code.
* Python 3 and 2 support.
* Installable via pip.
* Installable via [pip](https://pypi.org/project/zfs-autobackup/).
* Backwards compatible with your current backups and parameters.
* Progressive thinning (via a destroy schedule. default schedule should be fine for most people)
* Cleaner output, with optional color support (pip install colorama).
* Clear distinction between local and remote output.
* Summary at the beginning, displaying what will happen and the current thinning-schedule.
* More effient destroying/skipping snaphots on the fly. (no more space issues if your backup is way behind)
* More efficient destroying/skipping snapshots on the fly. (no more space issues if your backup is way behind)
* Progress indicator (--progress)
* Better property management (--set-properties and --filter-properties)
* Better resume handling, automaticly abort invalid resumes.
* Better resume handling, automatically abort invalid resumes.
* More robust error handling.
* Prepared for future enhanchements.
* Prepared for future enhancements.
* Supports raw backups for encryption.
* Custom SSH client config.
@ -29,9 +29,9 @@ Other settings are just specified on the commandline. This also makes it easier
Since its using ZFS commands, you can see what its actually doing by specifying `--debug`. This also helps a lot if you run into some strange problem or error. You can just copy-paste the command that fails and play around with it on the commandline. (also something I missed in other tools)
An imporant feature thats missing from other tools is a reliable `--test` option: This allows you to see what zfs-autobackup will do and tune your parameters. It will do everything, except make changes to your zfs datasets.
An important feature thats missing from other tools is a reliable `--test` option: This allows you to see what zfs-autobackup will do and tune your parameters. It will do everything, except make changes to your zfs datasets.
Another nice thing is progress reporting with `--progress`. Its very usefull with HUGE datasets, when you want to know how many hours/days it will take.
Another nice thing is progress reporting with `--progress`. Its very useful with HUGE datasets, when you want to know how many hours/days it will take.
zfs-autobackup tries to be the easiest to use backup tool for zfs.
@ -64,7 +64,7 @@ zfs-autobackup tries to be the easiest to use backup tool for zfs.
### Using pip
The recommended way on most servers is to use pip:
The recommended way on most servers is to use [pip](https://pypi.org/project/zfs-autobackup/):
```console
[root@server ~]# pip install --upgrade zfs-autobackup
@ -167,7 +167,7 @@ rpool/swap autobackup:offsite1 true
...
```
Because we dont want to backup everything, we can exclude certain filesystem by setting the property to false:
Because we don't want to backup everything, we can exclude certain filesystem by setting the property to false:
```console
[root@pve ~]# zfs set autobackup:offsite1=false rpool/swap
@ -184,7 +184,7 @@ rpool/swap autobackup:offsite1 false
### Running zfs-autobackup
Run the script on the backup server and pull the data from the server specfied by --ssh-source.
Run the script on the backup server and pull the data from the server specified by --ssh-source.
```console
[root@backup ~]# zfs-autobackup --ssh-source pve.server.com offsite1 backup/pve --progress --verbose
@ -192,16 +192,16 @@ Run the script on the backup server and pull the data from the server specfied b
#### Settings summary
[Source] Datasets on: pve.server.com
[Source] Keep the last 10 snapshots.
[Source] Keep oldest of 1 day, delete after 1 week.
[Source] Keep oldest of 1 week, delete after 1 month.
[Source] Keep oldest of 1 month, delete after 1 year.
[Source] Keep every 1 day, delete after 1 week.
[Source] Keep every 1 week, delete after 1 month.
[Source] Keep every 1 month, delete after 1 year.
[Source] Send all datasets that have 'autobackup:offsite1=true' or 'autobackup:offsite1=child'
[Target] Datasets are local
[Target] Keep the last 10 snapshots.
[Target] Keep oldest of 1 day, delete after 1 week.
[Target] Keep oldest of 1 week, delete after 1 month.
[Target] Keep oldest of 1 month, delete after 1 year.
[Target] Keep every 1 day, delete after 1 week.
[Target] Keep every 1 week, delete after 1 month.
[Target] Keep every 1 month, delete after 1 year.
[Target] Receive datasets under: backup/pve
#### Selecting
@ -235,14 +235,92 @@ Its also possible to let a server push its backup to the backup-server. However
### Automatic backups
Now everytime you run the command, zfs-autobackup will create a new snapshot and replicate your data.
Now every time you run the command, zfs-autobackup will create a new snapshot and replicate your data.
Older snapshots will evertually be deleted, depending on the `--keep-source` and `--keep-target` settings. (The defaults are shown above under the 'Settings summary')
Older snapshots will eventually be deleted, depending on the `--keep-source` and `--keep-target` settings. (The defaults are shown above under the 'Settings summary')
Once you've got the correct settings for your situation, you can just store the command in a cronjob.
Or just create a script and run it manually when you need it.
### Thinning out obsolete snapshots
The thinner is the thing that destroys old snapshots on the source and target.
The thinner operates "stateless": There is nothing in the name or properties of a snapshot that indicates how long it will be kept. Everytime zfs-autobackup runs, it will look at the timestamp of all the existing snapshots. From there it will determine which snapshots are obsolete according to your schedule. The advantage of this stateless system is that you can always change the schedule.
Note that the thinner will ONLY destroy snapshots that are matching the naming pattern of zfs-autobackup. If you use `--other-snapshots`, it wont destroy those snapshots after replicating them to the target.
#### Thinning schedule
The default thinning schedule is: `10,1d1w,1w1m,1m1y`.
The schedule consists of multiple rules separated by a `,`
A plain number specifies how many snapshots you want to always keep, regardless of time or interval.
The format of the other rules is: `<Interval><TTL>`.
* Interval: The minimum interval between the snapshots. Snapshots with intervals smaller than this will be destroyed.
* TTL: The maximum time to life time of a snapshot, after that they will be destroyed.
* These are the time units you can use for interval and TTL:
* `y`: Years
* `m`: Months
* `d`: Days
* `h`: Hours
* `min`: Minutes
* `s`: Seconds
Since this might sound very complicated, the `--verbose` option will show you what it all means:
```console
[Source] Keep the last 10 snapshots.
[Source] Keep every 1 day, delete after 1 week.
[Source] Keep every 1 week, delete after 1 month.
[Source] Keep every 1 month, delete after 1 year.
```
A snapshot will only be destroyed if it not needed anymore by ANY of the rules.
You can specify as many rules as you need. The order of the rules doesn't matter.
Keep in mind its up to you to actually run zfs-autobackup often enough: If you want to keep hourly snapshots, you have to make sure you at least run it every hour.
However, its no problem if you run it more or less often than that: The thinner will still do its best to choose an optimal set of snapshots to choose.
If you want to keep as few snapshots as possible, just specify 0. (`--keep-source=0` for example)
If you want to keep ALL the snapshots, just specify a very high number.
#### More details about the Thinner
We will give a practical example of how the thinner operates.
Say we want have 3 thinner rules:
* We want to keep daily snapshots for 7 days.
* We want to keep weekly snapshots for 4 weeks.
* We want to keep monthly snapshots for 12 months.
So far we have taken 4 snapshots at random moments:
![thinner example](doc/thinner.png)
For every rule, the thinner will divide the timeline in blocks and assign each snapshot to a block.
A block can only be assigned one snapshot: If multiple snapshots fall into the same block, it only assigns it to the oldest that we want to keep.
The colors show to which block a snapshot belongs:
* Snapshot 1: This snapshot belongs to daily block 1, weekly block 0 and monthly block 0. However the daily block is too old.
* Snapshot 2: Since weekly block 0 and monthly block 0 already have a snapshot, it only belongs to daily block 4.
* Snapshot 3: This snapshot belongs to daily block 8 and weekly block 1.
* Snapshot 4: Since daily block 8 already has a snapshot, this one doesn't belong to anything and can be deleted right away. (it will be keeped for now since its the last snapshot)
zfs-autobackup will re-evaluate this on every run: As soon as a snapshot doesn't belong to any block anymore it will be destroyed.
Snapshots on the source that still have to be send to the target wont be destroyed off course. (If the target still wants them, according to the target schedule)
## Tips
* Use ```--debug``` if something goes wrong and you want to see the commands that are executed. This will also stop at the first error.
@ -334,10 +412,10 @@ optional arguments:
--no-snapshot Dont create new snapshots (usefull for finishing
uncompleted backups, or cleanups)
--no-send Dont send snapshots (usefull for cleanups, or if you
want a serperate send-cronjob)
want a separate send-cronjob)
--min-change MIN_CHANGE
Number of bytes written after which we consider a
dataset changed (default 200000)
dataset changed (default 1)
--allow-empty If nothing has changed, still create empty snapshots.
(same as --min-change=0)
--ignore-replicated Ignore datasets that seem to be replicated some other
@ -361,11 +439,11 @@ optional arguments:
(recommended, prevents mount conflicts. same as --set-
properties canmount=noauto)
--filter-properties FILTER_PROPERTIES
List of propererties to "filter" when receiving
List of properties to "filter" when receiving
filesystems. (you can still restore them with zfs
inherit -S)
--set-properties SET_PROPERTIES
List of propererties to override when receiving
List of properties to override when receiving
filesystems. (you can still restore them with zfs
inherit -S)
--rollback Rollback changes to the latest target snapshot before
@ -401,7 +479,7 @@ You forgot to setup automatic login via SSH keys, look in the example how to do
This usually means you've created a new snapshot on the target side during a backup:
* Solution 1: Restart zfs-autobackup and make sure you dont use --resume. If you did use --resume, be sure to "abort" the recveive on the target side with zfs recv -A.
* Solution 1: Restart zfs-autobackup and make sure you don't use --resume. If you did use --resume, be sure to "abort" the receive on the target side with zfs recv -A.
* Solution 2: Destroy the newly created snapshot and restart zfs-autobackup.
### It says 'internal error: Invalid argument'
@ -430,13 +508,13 @@ Put this command directly after the zfs_backup command in your cronjob:
zabbix-job-status backup_smartos01_fs1 daily $?
```
This will update the zabbix server with the exitcode and will also alert you if the job didnt run for more than 2 days.
This will update the zabbix server with the exit code and will also alert you if the job didn't run for more than 2 days.
## Backuping up a proxmox cluster with HA replication
Due to the nature of proxmox we had to make a few enhancements to zfs-autobackup. This will probably also benefit other systems that use their own replication in combination with zfs-autobackup.
All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs-autobackup will be fine. (and it will get the next backup from the new node automaticly)
All data under rpool/data can be on multiple nodes of the cluster. The naming of those filesystem is unique over the whole cluster. Because of this we should backup rpool/data of all nodes to the same destination. This way we wont have duplicate backups of the filesystems that are replicated. Because of various options, you can even migrate hosts and zfs-autobackup will be fine. (and it will get the next backup from the new node automatically)
In the example below we have 3 nodes, named h4, h5 and h6.
@ -462,6 +540,7 @@ Extra options needed for proxmox with HA:
* --no-holds: To allow proxmox to destroy our snapshots if a VM migrates to another node.
* --ignore-replicated: To ignore the replicated filesystems of proxmox on the receiving proxmox nodes. (e.g: only backup from the node where the VM is active)
* --min-change 200000: Ignore replicated works by checking if there are no changes since the last snapshot. However for some reason proxmox always has some small changes. (Probably house-keeping data are something? This always was fine and suddenly changed with an update)
I use the following backup script on the backup server:
@ -469,7 +548,7 @@ I use the following backup script on the backup server:
for H in h4 h5 h6; do
echo "################################### DATA $H"
#backup data filesystems to a common place
./zfs-autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --no-holds $@
./zfs-autobackup --ssh-source root@$H data_smartos03 zones/backup/zfsbackups/pxe1_data --clear-refreservation --clear-mountpoint --ignore-transfer-errors --strip-path 2 --verbose --resume --ignore-replicated --min-change 200000 --no-holds $@
zabbix-job-status backup_$H""_data_smartos03 daily $? >/dev/null 2>/dev/null
echo "################################### RPOOL $H"

View File

@ -26,7 +26,7 @@ if sys.stdout.isatty():
except ImportError:
pass
VERSION="3.0-rc8"
VERSION="3.0-rc9"
HEADER="zfs-autobackup v{} - Copyright 2020 E.H.Eefting (edwin@datux.nl)\n".format(VERSION)
class Log:
@ -117,7 +117,7 @@ class ThinnerRule:
self.rule_str=rule_str
self.human_str="Keep oldest of {} {}{}, delete after {} {}{}.".format(
self.human_str="Keep every {} {}{}, delete after {} {}{}.".format(
period_amount, self.TIME_DESC[period_unit], period_amount!=1 and "s" or "", ttl_amount, self.TIME_DESC[ttl_unit], ttl_amount!=1 and "s" or "" )
@ -308,7 +308,7 @@ class ExecuteNode:
def __init__(self, ssh_config=None, ssh_to=None, readonly=False, debug_output=False):
"""ssh_config: custom ssh config
ssh_to: server you want to ssh to. none means local
readonly: only execute commands that dont make any changes (usefull for testing-runs)
readonly: only execute commands that don't make any changes (usefull for testing-runs)
debug_output: show output and exit codes of commands in debugging output.
"""
@ -347,7 +347,7 @@ class ExecuteNode:
def run(self, cmd, input=None, tab_split=False, valid_exitcodes=[ 0 ], readonly=False, hide_errors=False, pipe=False, return_stderr=False):
"""run a command on the node
readonly: make this True if the command doesnt make any changes and is safe to execute in testmode
readonly: make this True if the command doesn't make any changes and is safe to execute in testmode
pipe: Instead of executing, return a pipe-handle to be used to input to another run() command. (just like a | in linux)
input: Can be None, a string or a pipe-handle you got from another run()
return_stderr: return both stdout and stderr as a tuple
@ -365,9 +365,9 @@ class ExecuteNode:
encoded_cmd.append(self.ssh_to.encode('utf-8'))
#make sure the command gets all the data in utf8 format:
#(this is neccesary if LC_ALL=en_US.utf8 is not set in the environment)
#(this is necessary if LC_ALL=en_US.utf8 is not set in the environment)
for arg in cmd:
#add single quotes for remote commands to support spaces and other wierd stuff (remote commands are executed in a shell)
#add single quotes for remote commands to support spaces and other weird stuff (remote commands are executed in a shell)
encoded_cmd.append( ("'"+arg+"'").encode('utf-8'))
else:
@ -486,7 +486,7 @@ class ExecuteNode:
class ZfsDataset():
"""a zfs dataset (filesystem/volume/snapshot/clone)
Note that a dataset doesnt have to actually exist (yet/anymore)
Note that a dataset doesn't have to actually exist (yet/anymore)
Also most properties are cached for performance-reasons, but also to allow --test to function correctly.
"""
@ -500,7 +500,7 @@ class ZfsDataset():
def __init__(self, zfs_node, name, force_exists=None):
"""name: full path of the zfs dataset
exists: specifiy if you already know a dataset exists or not. for performance reasons. (othewise it will have to check with zfs list when needed)
exists: specify if you already know a dataset exists or not. for performance reasons. (otherwise it will have to check with zfs list when needed)
"""
self.zfs_node=zfs_node
self.name=name #full name
@ -589,7 +589,7 @@ class ZfsDataset():
def find_prev_snapshot(self, snapshot, other_snapshots=False):
"""find previous snapshot in this dataset. None if it doesnt exist.
"""find previous snapshot in this dataset. None if it doesn't exist.
other_snapshots: set to true to also return snapshots that where not created by us. (is_ours)
"""
@ -606,7 +606,7 @@ class ZfsDataset():
def find_next_snapshot(self, snapshot, other_snapshots=False):
"""find next snapshot in this dataset. None if it doesnt exist"""
"""find next snapshot in this dataset. None if it doesn't exist"""
if self.is_snapshot:
raise(Exception("Please call this on a dataset."))
@ -636,7 +636,7 @@ class ZfsDataset():
def create_filesystem(self, parents=False):
"""create a filesytem"""
"""create a filesystem"""
if parents:
self.verbose("Creating filesystem and parents")
self.zfs_node.run(["zfs", "create", "-p", self.name ])
@ -703,7 +703,7 @@ class ZfsDataset():
def is_ours(self):
"""return true if this snapshot is created by this backup_nanme"""
"""return true if this snapshot is created by this backup_name"""
if re.match("^"+self.zfs_node.backup_name+"-[0-9]*$", self.snapshot_name):
return(True)
else:
@ -866,7 +866,7 @@ class ZfsDataset():
"""returns a pipe with zfs send output for this snapshot
resume: Use resuming (both sides need to support it)
resume_token: resume sending from this token. (in that case we dont need to know snapshot names)
resume_token: resume sending from this token. (in that case we don't need to know snapshot names)
"""
#### build source command
@ -892,7 +892,7 @@ class ZfsDataset():
cmd.append("-P")
#resume a previous send? (dont need more parameters in that case)
#resume a previous send? (don't need more parameters in that case)
if resume_token:
cmd.extend([ "-t", resume_token ])
@ -910,7 +910,7 @@ class ZfsDataset():
# if args.buffer and args.ssh_source!="local":
# cmd.append("|mbuffer -m {}".format(args.buffer))
#NOTE: this doenst start the send yet, it only returns a subprocess.Pipe
#NOTE: this doesn't start the send yet, it only returns a subprocess.Pipe
return(self.zfs_node.run(cmd, pipe=True))
@ -925,7 +925,7 @@ class ZfsDataset():
cmd.extend(["zfs", "recv"])
#dont mount filesystem that is received
#don't mount filesystem that is received
cmd.append("-u")
for property in filter_properties:
@ -961,7 +961,7 @@ class ZfsDataset():
#check if transfer was really ok (exit codes have been wrong before due to bugs in zfs-utils and can be ignored by some parameters)
if not self.exists:
self.error("error during transfer")
raise(Exception("Target doesnt exist after transfer, something went wrong."))
raise(Exception("Target doesn't exist after transfer, something went wrong."))
# if args.buffer and args.ssh_target!="local":
# cmd.append("|mbuffer -m {}".format(args.buffer))
@ -982,7 +982,7 @@ class ZfsDataset():
if not prev_snapshot:
target_snapshot.verbose("receiving full".format(self.snapshot_name))
else:
#incemental
#incremental
target_snapshot.verbose("receiving incremental".format(self.snapshot_name))
#do it
@ -1056,7 +1056,7 @@ class ZfsDataset():
source_snapshot.debug("common snapshot")
return(source_snapshot)
target_dataset.error("Cant find common snapshot with source.")
raise(Exception("You probablly need to delete the target dataset to fix this."))
raise(Exception("You probably need to delete the target dataset to fix this."))
def find_start_snapshot(self, common_snapshot, other_snapshots):
@ -1149,7 +1149,7 @@ class ZfsDataset():
target_obsoletes=[]
#on source: destroy all obsoletes before common. but after common, only delete snapshots that target also doesnt want to explicitly keep
#on source: destroy all obsoletes before common. but after common, only delete snapshots that target also doesn't want to explicitly keep
before_common=True
for source_snapshot in self.snapshots:
if common_snapshot and source_snapshot.snapshot_name==common_snapshot.snapshot_name:
@ -1235,10 +1235,10 @@ class ZfsDataset():
prev_source_snapshot=source_snapshot
else:
source_snapshot.debug("skipped (target doesnt need it)")
source_snapshot.debug("skipped (target doesn't need it)")
#was it actually a resume?
if resume_token:
target_dataset.debug("aborting resume, since we dont want that snapshot anymore")
target_dataset.debug("aborting resume, since we don't want that snapshot anymore")
target_dataset.abort_resume()
resume_token=None
@ -1288,7 +1288,7 @@ class ZfsNode(ExecuteNode):
def parse_zfs_progress(self, line, hide_errors, prefix):
"""try to parse progress output of zfs recv -Pv, and dont show it as error to the user """
"""try to parse progress output of zfs recv -Pv, and don't show it as error to the user """
#is it progress output?
progress_fields=line.rstrip().split("\t")
@ -1377,7 +1377,7 @@ class ZfsNode(ExecuteNode):
self.verbose("No changes anywhere: not creating snapshots.")
return
#create consitent snapshot per pool
#create consistent snapshot per pool
for (pool_name, snapshots) in pools.items():
cmd=[ "zfs", "snapshot" ]
@ -1451,12 +1451,12 @@ class ZfsAutobackup:
parser.add_argument('target_path', help='Target ZFS filesystem')
parser.add_argument('--other-snapshots', action='store_true', help='Send over other snapshots as well, not just the ones created by this tool.')
parser.add_argument('--no-snapshot', action='store_true', help='Dont create new snapshots (usefull for finishing uncompleted backups, or cleanups)')
parser.add_argument('--no-send', action='store_true', help='Dont send snapshots (usefull for cleanups, or if you want a serperate send-cronjob)')
parser.add_argument('--min-change', type=int, default=200000, help='Number of bytes written after which we consider a dataset changed (default %(default)s)')
parser.add_argument('--no-snapshot', action='store_true', help='Don\'t create new snapshots (usefull for finishing uncompleted backups, or cleanups)')
parser.add_argument('--no-send', action='store_true', help='Don\'t send snapshots (usefull for cleanups, or if you want a serperate send-cronjob)')
parser.add_argument('--min-change', type=int, default=1, help='Number of bytes written after which we consider a dataset changed (default %(default)s)')
parser.add_argument('--allow-empty', action='store_true', help='If nothing has changed, still create empty snapshots. (same as --min-change=0)')
parser.add_argument('--ignore-replicated', action='store_true', help='Ignore datasets that seem to be replicated some other way. (No changes since lastest snapshot. Usefull for proxmox HA replication)')
parser.add_argument('--no-holds', action='store_true', help='Dont lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
parser.add_argument('--no-holds', action='store_true', help='Don\'t lock snapshots on the source. (Usefull to allow proxmox HA replication to switches nodes)')
#not sure if this ever was usefull:
# parser.add_argument('--ignore-new', action='store_true', help='Ignore filesystem if there are already newer snapshots for it on the target (use with caution)')
@ -1468,7 +1468,7 @@ class ZfsAutobackup:
# parser.add_argument('--destroy-stale', action='store_true', help='Destroy stale backups that have no more snapshots. Be sure to verify the output before using this! ')
parser.add_argument('--clear-refreservation', action='store_true', help='Filter "refreservation" property. (recommended, safes space. same as --filter-properties refreservation)')
parser.add_argument('--clear-mountpoint', action='store_true', help='Set property canmount=noauto for new datasets. (recommended, prevents mount conflicts. same as --set-properties canmount=noauto)')
parser.add_argument('--filter-properties', type=str, help='List of propererties to "filter" when receiving filesystems. (you can still restore them with zfs inherit -S)')
parser.add_argument('--filter-properties', type=str, help='List of properties to "filter" when receiving filesystems. (you can still restore them with zfs inherit -S)')
parser.add_argument('--set-properties', type=str, help='List of propererties to override when receiving filesystems. (you can still restore them with zfs inherit -S)')
parser.add_argument('--rollback', action='store_true', help='Rollback changes to the latest target snapshot before starting. (normally you can prevent changes by setting the readonly property on the target_path to on)')
parser.add_argument('--destroy-incompatible', action='store_true', help='Destroy incompatible snapshots on target. Use with care! (implies --rollback)')
@ -1608,7 +1608,7 @@ class ZfsAutobackup:
if self.args.test:
self.set_title("All tests successfull.")
else:
self.set_title("All backups completed succesfully")
self.set_title("All backups completed successfully")
else:
self.error("{} datasets failed!".format(fail_count))

BIN
doc/thinner.odg Normal file

Binary file not shown.

BIN
doc/thinner.png Normal file

Binary file not shown.

After

Width:  |  Height:  |  Size: 26 KiB