#include <k-means-clusterer.h>
Inheritance diagram for KMeansClusterer:
This implementation may not always return K clusters. There are two cases where we will return fewer than K clusters:
Both of these situations are generally unlikely, but you should be careful about the assumptions your code makes about the number of returned clusters.
Public Member Functions | |
KMeansClusterer (int num_clusters, int max_iters, const DistanceComputer &distance_computer) | |
void | Cluster (const vector< PointRef > &data) |
Performs the clustering and stores the result internally. | |
PointSet | GetClusterCenters () const |
Return the cluster centers. | |
int | GetNumCenters () const |
Return the number of cluster centers. | |
vector< int > | GetMembership () const |
Return the membership table. | |
void | WriteToStream (ostream &output_stream) const |
Write the clustering data to a stream. | |
void | WriteToFile (const char *output_filename) const |
Write the clustering data to a file. | |
void | ReadFromStream (istream &input_stream) |
Read clustering data from a stream. | |
void | ReadFromFile (const char *input_filename) |
Read clustering data from a file. | |
Protected Member Functions | |
virtual void | DoClustering (const vector< PointRef > &data) |
Perform K-means. | |
Protected Attributes | |
auto_ptr< PointSet > | cluster_centers_ |
vector< int > | membership_ |
bool | done_ |
KMeansClusterer | ( | int | num_clusters, | |
int | max_iters, | |||
const DistanceComputer & | distance_computer | |||
) |
void DoClustering | ( | const vector< PointRef > & | data | ) | [protected, virtual] |
Perform K-means.
Uses the DistanceComputer it was constructed with to fill up cluster_centers_ with K Features representing the K-means cluster centers. K is assigned by the constructor of KMeansClusterer. If there are fewer data points than K, then the total number of clusters returned is simply the total number of data points (not K).
Implements Clusterer.
void Cluster | ( | const vector< PointRef > & | data | ) | [inherited] |
Performs the clustering and stores the result internally.
To avoid potential memory problems, Clusterers do not operate on PointSetLists or PointSets directly. Rather, they simply shuffle PointRefs around.
PointSet GetClusterCenters | ( | ) | const [inherited] |
Return the cluster centers.
This requires Cluster() or ReadFromStream() to have been called first. It returns a PointSet where each Feature in it is one of the cluster centers.
int GetNumCenters | ( | ) | const [inherited] |
Return the number of cluster centers.
this requires Cluster() or ReadFromStream() to have been called first. It reutnrs the number of cluster centers.
vector< int > GetMembership | ( | ) | const [inherited] |
Return the membership table.
Let n be the number of points that were clustered. The returned vector is of size n as well, where each element tells you which cluster that point was placed in.
void WriteToStream | ( | ostream & | output_stream | ) | const [inherited] |
Write the clustering data to a stream.
Requires Cluster() or ReadFromStream() to have been called first. Output format:
The clustered points themselves are not written to the stream, only the centers and membership data. It is assumed that the caller of Cluster() already has access to those points anyway. This function aborts if the stream is bad.
void WriteToFile | ( | const char * | output_filename | ) | const [inherited] |
Write the clustering data to a file.
void ReadFromStream | ( | istream & | input_stream | ) | [inherited] |
Read clustering data from a stream.
Can be called in lieu of Cluster(). If this is called after Cluster(), all of the previous data is cleared before reading the new data. For the file format, see WriteToStream. This function aborts if the stream is bad.
void ReadFromFile | ( | const char * | input_filename | ) | [inherited] |
Read clustering data from a file.
auto_ptr<PointSet> cluster_centers_ [protected, inherited] |
vector<int> membership_ [protected, inherited] |
bool done_ [protected, inherited] |