Home
Up
Projects
Software
Tips and Tricks
6.001
Photos
 

[old presentations]
[old FAQs]

Syncing with Rsync

Motivation

At school, I use various network mounted filesystems. At times, I work from home or on a laptop and want to have access to these files. It is not always possible/convenient/efficient to mount the filesystems from these machines directly. Additionally, these systems sometimes give me technical troubles, so I have created a set of scripts that allow me to synchronize my directories.

For a synchronization system, I wanted the following features:

  • Removes extra files: If I delete a file or move a directory, I don't want it to reappear the next time a synchronize.
  • Maintains timestamps: The two main reasons for wanting timestamps to be preserved are: (1) if I goof up the synchronization, I want to still be able to sort out where the newest version of the file is, and (2) I want to only have changes transferred. The directories I'm synching are gigabytes in size and I really don't want to make a full copy every time I synch.
  • Preserves newer files: Call me crazy, but I semi-frequently end up modifying files on several computers before resynching. For example, I might have a great idea while at home and write it up at home, then later work on a laptop, all the while I have a job running and writing out results at school. At some point I'll want all three machines to be able to have the most recent version of all files.
  • Backup of overwritten files: In the past, I've made a few nasty mistakes and wiped out files that I really didn't want to. Backups are good.
  • Recursive: I want to sync a whole directory tree all at once.

Necessary Software

Although there are different ways of doing this, I'll explain my preferred method...

"Server" Configuration

Choose where the master copy of your directory tree will be. I'll assume that you always have a linux box that can access this copy directly (or that you know enough to configure ssh and rsync servers on Windows). For this tutorial, we'll say that the master directory is:
/filesystem/mastersyncdir
If you are using a Debian system, make sure you have an ssh server setup and then install rsync via:
sudo apt-get install rsync

Client Configuration

If you are using Linux on the client, install ssh and rsync. If you are using Windows, you'll need a copy of rsync and a command-line ssh. The easiest method is to install Cygwin. When you install cygwin, make sure that you install the openssh, rsync, and bash packages. Once you have installed the necessary software, create a directory that will hold a cached copy of the master directory. For this tutorial, we'll assume the following Windows directory:
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\"
where %USERNAME% is your Windows username. To handle the backups, also create a backup directory, e.g.:
"C:\Documents and Settings\%USERNAME%\My Documents\cachebackup\"

Exclusions File

Before creating the actual synchronization scripts, we'll create an (optional) exclusions file. This file tells rsync to not synchronize certain file types. For example, I sometimes like to synchronize C++ projects. When I do this, I don't want to transfer all of the object files and temporary files. If you have certain files or extensions you'd like to exclude, create the following file:
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\exclude.txt"
My exclusions file has the following contents:
*.obj
*.pch
*.o
*.suo
*.idb
*.pdb
*.ncb
a.out
*.bsc
*.sbr

Master-to-Cache Script

Now we need to write a script that makes the cache consistent with the master copy. Run this script any time that you want your local files to receive all of the updates from the master copy. Save the following file

(If the above script does not show up on your browser, see fromMaster_sh.html).
as
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\fromMaster.sh"

You'll want to modify the WINDIR and REMOTEDIR variables to correspond to the actual locations you're using. For Windows directories with spaces, we put a triple backslash before each space. If you don't understand why, just do it and don't worry about it (we have to do an extra escaping because we're passing the command to bash). For REMOTEDIR, replace [linuxuser] with your Linux username and [linuxserver] with the hostname of the Linux server.

If you're curious about what all the rsync options mean, see its manpage. If you didn't create an exclusions file, get rid of the --exclude-from=exclude.txt option.

Cache-to-Master Script

This script is to transfer local changes back to the master copy. Save

(If the above script does not show up on your browser, see toMaster_sh.html).
as
"C:\Documents and Settings\%USERNAME%\My Documents\cachedir\toMaster.sh"

Usage

Here are the most typical usage scenarios:
  • First time: The first time you transfer from the master to the cached directory (by running fromMaster.sh), rsync will copy over all the files. It will see all the script files you just saved and notice that they don't exist on the server. As a result it will delete them. If this happens to you, (1) go to the cachebackup directory and move the files back into the cachedir directory. Then do a cached-to-master transfer (via toMaster.sh) to put these synchronization files on the server.
  • Master-to-cached: Use this when the master copy on the server is the most up-to-date and you want your cached directory to receive all its changes. Run fromMaster.sh. If you want to be safe, first delete all the contents of cachebackup. After running the script, examine the new contents to make sure nothing got removed/updated that shouldn't have.
  • Cached-to-master: Use this when you have made local changes that you now want propagated back to the master. Run toMaster.sh Be forewarned that when transferring from the cache to the master, no backups are made. If you're unsure whether this will wipe out anything important, see the "Merge changes" scenario.
  • Merge changes: Follow this scenario if you have modified some files on the server and some files in the cache and you want both the cache and the server to be fully synchronized. It's also useful if you can't remember where the most recent changes have been made. Note that if you'd modified the same file on multiple machines, the most recent timestamp wins. First, do a master-to-cached transfer, following the instructions on how to be safe. Now do a cached-to-master transfer.
  • From one cache to another: Do a cached-to-master on the first cached machine. Then do a master-to-cached on the second.
Note that if you can get your hands on a kerberized version of ssh for cygwin, this makes things easier if your ssh is authenticated via kerberos.

Alternatives

  • Direct-Mounted Synchronizers: There are various alternatives that you can use if you actually do mount the filesystems at the time of synchronization. You could hunt one of them down for yourself if you'd like. Since this doesn't match my situation, I don't remember what the programs are on Windows. On linux, rsync is probably still the easiest.
  • Unison: Another program that does synchronizing. For me it was more difficult to use and didn't give me the exact feature set that I wanted. Other people love it.

 

Last updated: 2008-01-25 09:23:57 -0500