[old
presentations]
[old FAQs]
| |
Syncing with Rsync
Motivation
At school, I use various network mounted filesystems. At times, I
work from home or on a laptop and want to have access to these files.
It is not always possible/convenient/efficient to mount the filesystems
from these machines directly. Additionally, these systems sometimes
give me technical troubles, so I have created a set of scripts that
allow me to synchronize my directories.
For a synchronization system, I wanted the following features:
- Removes extra files: If I delete a file or move a directory,
I don't want it to reappear the next time a synchronize.
- Maintains timestamps: The two main reasons for wanting
timestamps to be preserved are: (1) if I goof up the synchronization,
I want to still be able to sort out where the newest version of the
file is, and (2) I want to only have changes transferred. The
directories I'm synching are gigabytes in size and I really don't
want to make a full copy every time I synch.
- Preserves newer files: Call me crazy, but I semi-frequently
end up modifying files on several computers before resynching. For
example, I might have a great idea while at home and write it up at
home, then later work on a laptop, all the while I have a job running
and writing out results at school. At some point I'll want all three
machines to be able to have the most recent version of all files.
- Backup of overwritten files: In the past, I've made a few
nasty mistakes and wiped out files that I really didn't want to.
Backups are good.
- Recursive: I want to sync a whole directory tree all at once.
Necessary Software
Although there are different ways of doing this, I'll explain my preferred
method...
"Server" Configuration
Choose where the master copy of your directory tree will be. I'll assume
that you always have a linux box that can access this copy directly (or
that you know enough to configure ssh and rsync servers on Windows). For
this tutorial, we'll say that the master directory is:
/filesystem/mastersyncdir
If you are using a Debian system, make sure you have an ssh server setup
and then install rsync via:
sudo apt-get install rsync
Client Configuration
If you are using Linux on the client, install ssh and rsync. If you are
using Windows, you'll need a copy of rsync and a command-line ssh. The
easiest method is to install Cygwin. When
you install cygwin, make sure that you install the openssh, rsync, and bash
packages. Once you have installed the necessary software, create a directory
that will hold a cached copy of the master directory. For this tutorial,
we'll assume the following Windows directory:
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\"
where %USERNAME% is your Windows username. To handle the backups,
also create a backup directory, e.g.:
"C:\Documents and Settings\%USERNAME%\My Documents\cachebackup\"
Exclusions File
Before creating the actual synchronization scripts, we'll create an (optional) exclusions file.
This file tells rsync to not synchronize certain file types. For example, I sometimes
like to synchronize C++ projects. When I do this, I don't want to transfer all of the object
files and temporary files. If you have certain files or extensions you'd like to exclude, create
the following file:
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\exclude.txt"
My exclusions file has the following contents:
*.obj
*.pch
*.o
*.suo
*.idb
*.pdb
*.ncb
a.out
*.bsc
*.sbr
Master-to-Cache Script
Now we need to write a script that makes the cache consistent with the master
copy. Run this script any time that you want your local files to receive all of the updates from
the master copy. Save the following file
(If the above script does not show up on your browser, see
fromMaster_sh.html).
as
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\fromMaster.sh"
You'll want to modify the WINDIR and REMOTEDIR variables to
correspond to the actual locations you're using. For Windows directories with
spaces, we put a triple backslash before each space. If you don't understand why,
just do it and don't worry about it (we have to do an extra escaping because we're
passing the command to bash). For REMOTEDIR, replace [linuxuser]
with your Linux username and [linuxserver] with the hostname of the
Linux server.
If you're curious about what all the rsync options mean, see its
manpage. If you didn't create an exclusions file, get rid of the --exclude-from=exclude.txt
option.
Cache-to-Master Script
This script is to transfer local changes back to the master copy. Save
(If the above script does not show up on your browser, see
toMaster_sh.html).
as
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\toMaster.sh"
Usage
Here are the most typical usage scenarios:
- First time: The first time you transfer from the master
to the cached directory (by running fromMaster.sh), rsync
will copy over all the files. It will see all the script
files you just saved and notice that they don't exist on the server.
As a result it will delete them. If this happens to you, (1) go
to the cachebackup directory and move the files back into
the cachedir directory. Then do a cached-to-master
transfer (via toMaster.sh) to put these synchronization files
on the server.
- Master-to-cached: Use this when the master copy on the server
is the most up-to-date and you want your cached directory to receive
all its changes. Run fromMaster.sh. If you want to be safe,
first delete all the contents of cachebackup. After running
the script, examine the new contents to make sure nothing got
removed/updated that shouldn't have.
- Cached-to-master: Use this when you have made local changes
that you now want propagated back to the master. Run toMaster.sh
Be forewarned that when transferring from the cache to the master, no
backups are made. If you're unsure whether this will wipe out anything
important, see the "Merge changes" scenario.
- Merge changes: Follow this scenario if you have modified some
files on the server and some files in the cache and you want both the
cache and the server to be fully synchronized. It's also useful if you
can't remember where the most recent changes have been made. Note that
if you'd modified the same file on multiple machines, the most recent
timestamp wins. First, do a master-to-cached transfer, following the
instructions on how to be safe. Now do a cached-to-master transfer.
- From one cache to another: Do a cached-to-master on the first
cached machine. Then do a master-to-cached on the second.
Note that if you can get your hands on a kerberized version of ssh for cygwin,
this makes things easier if your ssh is authenticated via kerberos.
Alternatives
- Direct-Mounted Synchronizers: There are various
alternatives that you can use if you actually do mount the
filesystems at the time of synchronization. You could hunt
one of them down for yourself if you'd like. Since this doesn't
match my situation, I don't remember what the programs are on
Windows. On linux, rsync is probably still the easiest.
- Unison: Another program that does synchronizing. For
me it was more difficult to use and didn't give me the exact feature
set that I wanted. Other people love it.
|