Some software I've written:

OpenFST Phonetic Transliteration

This is a demonstration of how to use OpenFST to manipulate weighted finite-state transducers in a C++ program. It reads files formatted like the CMU pronunciation dictionary, and learns a noisy-channel transliteration model. The results aren't great -- individual phoneme-to-letter emissions is probably not the right model for phonetic transcription -- but it should be helpful if you want to learn how to use OpenFST. Requires the BOOST C++ libraries.

FSTPronouncer.tgz [March 22, 2009]


Bayesian Unsupervised Topic Segmentation

This code for doing linear topic segmentation on text can be found at its own page.

Dirichlet Process Mixture Models in Matlab

Dirichlet Process Mixture Models -- also called Infinite Mixture Models -- are a cool way to do clustering when you don't know how many clusters you want. This is a matlab implementation for Dirichlet Process Mixture Models with multivariate gaussian observations. This is the "collapsed" version, meaning that the sufficient statistics of the Gaussians are marginalized out. Some of the code is based on Michael Mandel's earlier implementation -- all of it is GPL'd. To run this code, you'll need to have two other things installed: Annoyingly, it seems to be important that you add lightspeed to the classpath after you add the BNT -- they have slightly different parameterizations of some sampling function, I think, but I haven't figured out exactly which one.

Finally, please post any questions to the blog so that others can see them.


RISO LBFGS Wrapper for Weka

Weka is a machine-learning package. It has its own Limited-Memory BFGS optimization code, but I found that it was very slow when applied to my own custom conditional model. Then I found the RISO code for LBFGS, which looked great, but was less friendly to integrate than the Weka optimization package. My wrapper tries to provide the same sort of interface as the Weka optimization package.

An aside: I also tried the Mallet optimization package, but got occasionally strange exceptions that I did not understand. This could be because I am working with objectives that are not guaranteed to be convex.


CondensationTracker

A few people were interested and had questions about this code; I moved the discussion over to
my blog, so that questions could be asked and answered in a more public forum.

SPAM

Simple Painless Annotation of Movies

Note: I haven't worked on this package in a long time, and I no longer have time to support it. You'd likely be better served looking for a comparable tool that it is actively maintained. However, the jar file is provided here for the curious. (August 31, 2008)

Get it here.

You will also need:

Spam allows you to do annotation of Quicktime-playable movies or audio files. The primary design goal of SPAM is to do annotation quickly, using lots of keystrokes whenever possible. It's not supported and there's little help or documentation - sorry.

SPAM is (c) MIT 2005. It's free for academic purposes.

Other comparable tools:

Anvil
Anvil's got lots of features and is probably more stable than SPAM. I found the UI to be a little clunky.
IBM Multimodal Annotation Tool
I've never tried this.

SmartSweeper

It's an engine for writing simple pattern-matching rulesets to play the game of minesweeper. I'll try to post the code soon. SmartSweeper is (c) MIT 2005.

TableRex

This is a genetic algorithms toolkit for the game of
robocode. I'm not 100% sure about the state of this code, but if you're interested in robocode you can try it out. Also, I see that Robocode is now open-source -- I have no idea what that means for compatibility with this code, which was written in early 2003.

There are three parts:

SmallBrain.java
This is the actual TableRex interpreter. It extends robocode.AdvancedRobot
BrainWorld.java
This is the external thing that controls the genetic algorithm. This has the main method that you run.
GeneticAlgorithm.java
This is the genetic algorithm implementation. Keep in mind that it makes little sense to try and optimize this code for speed, since all the time is spent in evaluating the robots in robocode.

Here's a video of a robot that learned a very specialized dodging pattern to beat squigbot. That's what happens when you train against only a single adversary.

If you use this code, please cite the following:

J. Eisenstein. Evolving Robocode Tank Fighters. MIT AI Lab Memo AIM-2003-023.

One last thing -- some other guys took this idea much further than I did. I think they used some of this code, although I'm not totally sure what went into their final version. Check out their paper.

All code is (c) MIT 2005.


ARFF2SVML

Here's a little perl script that converts
WEKA-friendly ARFF format to the format that SVM-light appreciates:

arff2svml.pl