KMeansClusterer Class Reference

#include <k-means-clusterer.h>

Inheritance diagram for KMeansClusterer:

Clusterer List of all members.

Detailed Description

Implements K-Means clustering.

This implementation may not always return K clusters. There are two cases where we will return fewer than K clusters:

  1. If the number of points provided (N) is less than K, then KMeansClusterer will return N clusters, where each cluster center is one of the points.
  2. If the data contains duplicate points, and there are fewer than K unique points, then KMeansClusterer will return M points, where M is the number of unique points in the data.

Both of these situations are generally unlikely, but you should be careful about the assumptions your code makes about the number of returned clusters.


Public Member Functions

 KMeansClusterer (int num_clusters, int max_iters, const DistanceComputer &distance_computer)
void Cluster (const vector< PointRef > &data)
 Performs the clustering and stores the result internally.
PointSet GetClusterCenters () const
 Return the cluster centers.
int GetNumCenters () const
 Return the number of cluster centers.
vector< int > GetMembership () const
 Return the membership table.
void WriteToStream (ostream &output_stream) const
 Write the clustering data to a stream.
void WriteToFile (const char *output_filename) const
 Write the clustering data to a file.
void ReadFromStream (istream &input_stream)
 Read clustering data from a stream.
void ReadFromFile (const char *input_filename)
 Read clustering data from a file.

Protected Member Functions

virtual void DoClustering (const vector< PointRef > &data)
 Perform K-means.

Protected Attributes

auto_ptr< PointSetcluster_centers_
vector< int > membership_
bool done_


Constructor & Destructor Documentation

KMeansClusterer ( int  num_clusters,
int  max_iters,
const DistanceComputer distance_computer 
)


Member Function Documentation

void DoClustering ( const vector< PointRef > &  data  )  [protected, virtual]

Perform K-means.

Uses the DistanceComputer it was constructed with to fill up cluster_centers_ with K Features representing the K-means cluster centers. K is assigned by the constructor of KMeansClusterer. If there are fewer data points than K, then the total number of clusters returned is simply the total number of data points (not K).

Implements Clusterer.

void Cluster ( const vector< PointRef > &  data  )  [inherited]

Performs the clustering and stores the result internally.

To avoid potential memory problems, Clusterers do not operate on PointSetLists or PointSets directly. Rather, they simply shuffle PointRefs around.

See also:
PointSetList::GetPointRefs()

PointSet GetClusterCenters (  )  const [inherited]

Return the cluster centers.

This requires Cluster() or ReadFromStream() to have been called first. It returns a PointSet where each Feature in it is one of the cluster centers.

int GetNumCenters (  )  const [inherited]

Return the number of cluster centers.

this requires Cluster() or ReadFromStream() to have been called first. It reutnrs the number of cluster centers.

vector< int > GetMembership (  )  const [inherited]

Return the membership table.

Let n be the number of points that were clustered. The returned vector is of size n as well, where each element tells you which cluster that point was placed in.

void WriteToStream ( ostream &  output_stream  )  const [inherited]

Write the clustering data to a stream.

Requires Cluster() or ReadFromStream() to have been called first. Output format:

void WriteToFile ( const char *  output_filename  )  const [inherited]

Write the clustering data to a file.

void ReadFromStream ( istream &  input_stream  )  [inherited]

Read clustering data from a stream.

Can be called in lieu of Cluster(). If this is called after Cluster(), all of the previous data is cleared before reading the new data. For the file format, see WriteToStream. This function aborts if the stream is bad.

See also:
WriteToStream.

void ReadFromFile ( const char *  input_filename  )  [inherited]

Read clustering data from a file.


Member Data Documentation

auto_ptr<PointSet> cluster_centers_ [protected, inherited]

vector<int> membership_ [protected, inherited]

bool done_ [protected, inherited]


The documentation for this class was generated from the following files:
Generated on Wed May 2 11:17:13 2007 for libpmk by  doxygen 1.5.1