This is a brief overview of Python machine learning toolkits, as of June 7, 2008. I was looking for something like Weka for Python. I settled on Orange, as it seemed to have the largest feature set, and was the only one with a gui. I’ve used it for about a week and it seems pretty nice, although I haven’t tried out the gui yet.
Package | Last release | # of classifiers | Clustering? | Cross-validation? | Gui? | Native to python? | Sparse data sets? | Integrates with Matplotlib? | Notes |
---|---|---|---|---|---|---|---|---|---|
Orange | 05/2008 | 10+ (rules, svm, clustering, trees) | Has clustering | Has cross-validation | Has gui | Wraps C++, but designed for Python | Has sparse data sets | Does not integrate with matplotlib | |
PyML | 05/2008 | 3 classifiers | No clustering | Has cross-validation | No gui | Native | Has sparse data sets | Integrates with Matplotlib | |
Shogun | 05/2008 | 5 classifiers (with SVM craziness) | No clustering | Has cross-validation | No gui | Wraps C++ | Has sparse data sets | Integrates with Matplotlib | Long page of citations. Interfaces to R, Octave, Matlab as well as python. |
MDP | 05/2008 | 10+ nodes, some of which are classifiers | no clustering | No cross validation | No gui | Native | No sparse data sets | No matplotlib | More complicated than just a classifier suite. Users construct networks of operations, each node of which is a classifier or something else. |