BackupNetClone
Philosophy
The philosophy behind BackupNetClone can be blamed on many
things, but here's my general thought process:
My Dad runs a small business and has become frustrated with present (1995
to now, i.e., 2008) backup solutions. He makes his own catalog and makes
some informational DVDs for his products, but the data size is very
reasonable--all of his data fits on a cheap 200GB hard drive. With a
digital camera and young kids, even I am close to outgrowing my current
300GB system drive. Unfortunately, hard drives and normal quantities of
data have far out-stripped current non-industrial backup solutions:
- tape backup
- too high of a cost/data ratio for average (small) business/home
users; current prices are $300 for 100GB backup drives, not including
tapes; for that price I could buy 2x 500GB hard drives as backup
- requires user to remember to switch tapes and bring them home for
successful offsite backup; also requires user to remember proper
rotation of tapes in order to perform full and incremental
backups
- cumbersome hardware and software: Have you tried performing a
restore from a tape backup? You're left at the mercy of the vendor's
backup software since the data is in a proprietary format on the
tapes.
- DVD and other optical media
- though cheap, DVDs just don't hold enough data to backup a system;
users (including myself) skip doing backups because it requires
swapping so many DVDs during ONE backup; this is true even with new
high-capacity blue-laser-based DVD drives
- same issue as tapes of having user remember to bring discs offsite,
and remember backup scheme if doing full and incremental backups
- media lasts only a few years, thus requiring even more effort (making
duplicates of old DVDs onto new DVDs) on user's behalf if he/she
wishes to maintain backups for many years
- Internet backup companies
- data is stored who-knows-where
- cost/data ratio is high over time
- at the mercy of the stability of the company--if they go under,
where's your data?
- often needs high-bandwidth connection to perform full backups
- RAID in a PC or shared network drive
- doesn't provide offsite backup
- depends on proprietary hardware and/or software; if the RAID
equipment fails, will you be able to bring the hard drives into
another system to gain access to your data?
Since 2000, I've been pondering this issue a lot, trying to find a reasonable
solution for backup. I've finally found it:
BackupNetClone uses standard hard drives (currently the best cost/data ratio
available for computer storage) on a standard Linux system to create full
offsite backups of data. The data is transferred securely over the Internet
and is basically maintenance-free. At any time, the backup drive(s) can
be placed in any other Linux computer to gain full access to the data.
Additionally, the backup is stored in such a way as to provide something
better than incremental backups. Every backup performed is a full
backup, but the space consumed is only the difference between one backup and
the previous. The snapshots can be shared on a network to easily
retrieve any file from any previous backup.
BackupNetClone becomes especially exciting when combining it with current
NAS devices that already have Linux on them. I personally use it on the
D-Link DNS-323,
but other possibilities include D-Link DSM-G600,
Linksys
NSLU2,
TRENDnet
TS-I300,
LaCie
Ethernet
Disk mini, and more. (See
http://nas-central.org/ALL_COMMUNITIES/Collection_of_NAS-Hacking_communities.html
for more ideas.) With the two SATA slots in the DNS-323, I have the
ability to do offsite backup and even make a duplicate backup from one
slot to the other.
Technical Overview
BackupNetClone's operations run through various shell scripts
in the following order:
- start_here.sh
- system_config.sh
- create_lock.sh
- init_email.sh
- do_rsync.sh (for each server to back up)
- complete_email.sh
BackupNetClone roughly performs the following steps:
- check for lock file to make sure rsync/SSH is only running one instance
at a time
- create a lock file for the current instance
- rotate logs if necessary
- test a simple SSH connection to make sure rsync server (on "backup
client") is available
- open an SSH tunnel
- if no previous backups:
- rsync the full amount of data
- if previous backup available that we can use to transfer only
incremental data:
- create hard links from current backup to the previous backup, thus
making an effective duplicate of the previous backup/snapshot
- use rsync to find out what has changed since the previous backup
- remove the hard links for the files that have changed (otherwise the
hardlinks would cause rsync to update the files from previous
backups/snapshots as a side-effect of updating the current
backup)
- fully copy changed files from previous backup to current backup
- rsync only the changed files
- I realize I could've used rsync with --delete and --link-dest
options in order for rsync to automatically manage my
hard-linking. The problem with this (as pointed out in
one of the websites
shown on the main page) is that it
gets to be inefficient with large files that have small changes
often. So instead of having rsync transfer those changed files
in their entirety every time they change even a little (which is
how the --delete and --link-dest options behave currently), I
take advantage of rsync's manner of transferring just the
changed portions. I still get to use hard links for files that
haven't changed, so I get the best of both worlds with using no
space for unchanged files and yet still only transferring
minimal data for file changes. Maybe a future version of rsync
will implement this seamlessly.
- rsync again to delete any old files (if pruning is turned on)
- close the SSH tunnel
- send an email if configured to do so
Benjamin L. Brown, released to the Public Domain.