Redirecting Old Websites

Kimo Johnson

16 May 2009

I have been gone from Dartmouth for over a year and my website has remained active but somewhat broken due to neglect. It turns out that some of the blogs on that site are on the first page of google search results for related keywords. When my Dartmouth site finally goes away, it would be a shame if users encountered 404 errors when trying to access pages. I did some quick searching for the best way to redirect a page to a new location and it seems that a 301 redirect is a good option. I wrote a python script to redirect all the html pages on my site to their new urls and this script might be useful for other people facing the same problem.

Script

The script walks the full site searching for files with specific extensions, in this case .html. It then performs simple find and replace to redirect old file paths to new urls.

You will need to change some of the variables to reflect your old and new sites. These variables can be found under Script settings below.

# Filename:      make301.py
# Last Modified: 5/16/2009
import sys
import os.path
import re


# Script settings
extensions = ['.html']
olddir     = 'public_html'
oldbase    = '/~name'
newsite    = 'http://mynewdomain.com'


#
# Process command line arguments
#
def processArgs(argv):
        argc = len(argv)
        if argc < 3:
                print "Usage: make301.py <directory> <output file>"
                sys.exit()

        args = map(lambda s: s.strip(), argv[1:])
        
        # Make sure directory exists
        if not os.path.exists(args[0]):
                print 'Directory "%s" does not exist.' % args[0]
                sys.exit()
        
        return tuple(args)

#
# Add filename to list if the extension is in the list
# of extensions specified above.
#
def getAllFiles(files, root, names):
        for name in names:
                (base, ext) = os.path.splitext(name)
                if ext.lower() in extensions:
                        files.append(os.path.join(root,name))
#
#
#
def main(argv):
        (source_dir, htfile) = processArgs(argv)

        files = []
        os.path.walk(source_dir, getAllFiles, files)

        # Print image names to file
        fd = open(htfile,'w')

        for f in files:
                page = f.replace(olddir,oldbase)
                url  = f.replace(olddir,newsite)
                
                line = "redirect 301 %s %s\n" % (page,url)
                fd.write(line)

        fd.close()
        
if __name__ == "__main__":
        main(sys.argv)

Usage

To run the script, go to the directory above your web directory (in my case public_html) and type the following:

% python make301.py public_html .htaccess