Backups with ZFS and Amazon's S3
by Richard Kolkovich
Amazon’s S3 provides unlimited, relatively cheap storage, accessible via HTTP (using REST/SOAP). This is ideal for personal offsite backups. The barrier to entry is low as are the costs.
Sun’s ZFS turns everything you know about managing filesystems on its head. When I recently upgraded my storage array, I decided to load my machine with extra RAM for ZFS rather than buying a dedicated RAID card. One of the best features of ZFS is the low-cost snapshots. You can snapshot a filesystem, and said snapshot will not take any space on disk until the original is modified. To put it another way, the snapshots only store the (block-level) diffs.
The icing on the snapshot cake is the ability to send a snapshot as a stream. This can be piped over the network (i.e. ssh) or simply output to a file (then bzipped and uploaded to S3!). ZFS also allows you to send a differential of two snapshots.
To put this into action, I have written a script which will create a snapshot, bzip it, encrypt it and upload it to S3. I use a threshold to determine whether I should upload the full or incremental to save space/bandwidth (and time, as my cable upstream isn’t that great…). To interface with S3, I’m using s3tools.
I realized afterward that s3tools have GPG encryption built in, but I think it is simpler to use openssl and a passphrase for this use-case. Here’s the script:
#!/bin/sh
#
# Copyright 2010 Richard Kolkovich. All rights reserved.
#
# Redistribution and use in source and binary forms, with or without modification, are
# permitted provided that the following conditions are met:
#
# 1. Redistributions of source code must retain the above copyright notice, this list of
# conditions and the following disclaimer.
#
# 2. Redistributions in binary form must reproduce the above copyright notice, this list
# of conditions and the following disclaimer in the documentation and/or other materials
# provided with the distribution.
#
# THIS SOFTWARE IS PROVIDED BY Richard Kolkovich ``AS IS'' AND ANY EXPRESS OR IMPLIED
# WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND
# FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL Richard Kolkovich OR
# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
# CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
# SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
# ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
# NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
# ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
#
# The views and conclusions contained in the software and documentation are those of the
# authors and should not be interpreted as representing official policies, either expressed
# or implied, of Richard Kolkovich.
#
ZFS=/sbin/zfs
BZIP=/usr/bin/bzip2
OPENSSL=/usr/bin/openssl
MD5=/sbin/md5
S3CMD=/opt/s3cmd/s3cmd
BACKUP_DIR=/tank/backup
TEMP=$BACKUP_DIR/tmp
PASSFILE=$BACKUP_DIR/.password
if ! test -d $TEMP; then
mkdir $TEMP
fi
# Backup a given zfs store
# arguments: name, zfs fs, threshold (0 to always use incremental), S3 bucket
backup() {
NAME=$1
FS=$2
THRESHOLD=$3
BUCKET=$4
# incremental
SNAP=$FS@incremental
SUFFIX=incremental
$ZFS destroy $SNAP
$ZFS snapshot $SNAP
$ZFS send -i $FS@full $SNAP > $TEMP/$NAME
export BLOCKSIZE=1024
SIZE=$(( `du $TEMP/$NAME | awk '{print $1}'` * 1024 ))
if test $THRESHOLD -eq 0 || test $SIZE -gt $THRESHOLD; then
rm $TEMP/$NAME
$ZFS destroy $SNAP
SNAP=$FS@full
SUFFIX=full
$ZFS destroy $SNAP
$ZFS snapshot $SNAP
$ZFS send $SNAP > $TEMP/$NAME
# a new full invalidates old incrementals
$S3CMD del s3://$BUCKET/$NAME-incremental.bak
$S3CMD del s3://$BUCKET/$NAME-incremental.bak.md5
fi
# compress
$BZIP $TEMP/$NAME
FILE=$TEMP/$NAME.bz2
# encrypt snapshot
$OPENSSL enc -aes-256-cbc -salt -pass file:$PASSFILE -in $FILE -out $FILE.bak
rm $FILE
FILE=$FILE.bak
$MD5 $FILE > $FILE.md5
# send snapshot to S3
$S3CMD put $FILE s3://$BUCKET/$NAME-$SUFFIX.bak
$S3CMD put $FILE.md5 s3://$BUCKET/$NAME-$SUFFIX.bak.md5
# clean up
rm $FILE
mv $FILE.md5 $BACKUP_DIR/$NAME.md5
}
And I call backup() thusly: backup "private" "tank/private" 52428800 "my.backup.bucket"