Python Machine Learning Packages

This is a brief overview of Python machine learning toolkits, as of June 7, 2008. I was looking for something like Weka for Python. I settled on Orange, as it seemed to have the largest feature set, and was the only one with a gui. I’ve used it for about a week and it seems pretty nice, although I haven’t tried out the gui yet.

Package	Last release	# of classifiers	Clustering?	Cross-validation?	Gui?	Native to python?	Sparse data sets?	Integrates with Matplotlib?	Notes
Orange	05/2008	10+ (rules, svm, clustering, trees)	Has clustering	Has cross-validation	Has gui	Wraps C++, but designed for Python	Has sparse data sets	Does not integrate with matplotlib
PyML	05/2008	3 classifiers	No clustering	Has cross-validation	No gui	Native	Has sparse data sets	Integrates with Matplotlib
Shogun	05/2008	5 classifiers (with SVM craziness)	No clustering	Has cross-validation	No gui	Wraps C++	Has sparse data sets	Integrates with Matplotlib	Long page of citations. Interfaces to R, Octave, Matlab as well as python.
MDP	05/2008	10+ nodes, some of which are classifiers	no clustering	No cross validation	No gui	Native	No sparse data sets	No matplotlib	More complicated than just a classifier suite. Users construct networks of operations, each node of which is a classifier or something else.