KMeansClusterer Class Reference

#include <k-means-clusterer.h>

Inheritance diagram for KMeansClusterer:

Clusterer List of all members.

Detailed Description

Implements K-Means clustering.

This implementation may not always return K clusters. There are two cases where we will return fewer than K clusters:

  1. If the number of points provided (N) is less than K, then KMeansClusterer will return N clusters, where each cluster center is one of the points.
  2. If the data contains duplicate points, and there are fewer than K unique points, then KMeansClusterer will return M points, where M is the number of unique points in the data.

Both of these situations are generally unlikely, but you should be careful about the assumptions your code makes about the number of returned clusters.


Public Member Functions

 KMeansClusterer (int num_clusters, int max_iters, const DistanceComputer &distance_computer)
void Cluster (const vector< PointRef > &data)
 Performs the clustering and stores the result internally.
const PointSetcenters () const
 Output the cluster centers.
int centers_size () const
 Return the number of cluster centers.
int membership (int index) const
 Return the membership of the <index>th point.
int membership_size () const
 Return the number of members. Equivalent to the number of points that were clustered.
void WriteToStream (ostream &output_stream) const
 Write the clustering data to a stream.
void WriteToFile (const char *output_filename) const
 Write the clustering data to a file.
void ReadFromStream (istream &input_stream)
 Read clustering data from a stream.
void ReadFromFile (const char *input_filename)
 Read clustering data from a file.

Protected Member Functions

virtual void DoClustering (const vector< PointRef > &data)
 Perform K-means.

Protected Attributes

auto_ptr< PointSetcluster_centers_
vector< int > membership_
bool done_


Constructor & Destructor Documentation

KMeansClusterer ( int  num_clusters,
int  max_iters,
const DistanceComputer distance_computer 
)


Member Function Documentation

void DoClustering ( const vector< PointRef > &  data  )  [protected, virtual]

Perform K-means.

Uses the DistanceComputer it was constructed with to fill up cluster_centers_ with K Point representing the K-means cluster centers. K is assigned by the constructor of KMeansClusterer. If there are fewer data points than K, then the total number of clusters returned is simply the total number of data points (not K).

Implements Clusterer.

void Cluster ( const vector< PointRef > &  data  )  [inherited]

Performs the clustering and stores the result internally.

To avoid potential memory problems, Clusterers do not operate on PointSetLists or PointSets directly. Rather, they simply shuffle PointRefs around.

See also:
PointSetList::GetPointRefs()

const PointSet & centers (  )  const [inherited]

Output the cluster centers.

This requires Cluster() or ReadFromStream() to have been called first. It returns the set of all cluster centers as Points.

int centers_size (  )  const [inherited]

Return the number of cluster centers.

This requires Cluster() or ReadFromStream() to have been called first. It reutnrs the number of cluster centers.

int membership ( int  index  )  const [inherited]

Return the membership of the <index>th point.

The return value gives the ID of the cluster that this point belongs to. "ID" in this sense means an index into the PointSet returned by centers().

int membership_size (  )  const [inherited]

Return the number of members. Equivalent to the number of points that were clustered.

void WriteToStream ( ostream &  output_stream  )  const [inherited]

Write the clustering data to a stream.

Requires Cluster() or ReadFromStream() to have been called first. Output format:

void WriteToFile ( const char *  output_filename  )  const [inherited]

Write the clustering data to a file.

void ReadFromStream ( istream &  input_stream  )  [inherited]

Read clustering data from a stream.

Can be called in lieu of Cluster(). If this is called after Cluster(), all of the previous data is cleared before reading the new data. For the file format, see WriteToStream. This function aborts if the stream is bad.

See also:
WriteToStream.

void ReadFromFile ( const char *  input_filename  )  [inherited]

Read clustering data from a file.


Member Data Documentation

auto_ptr<PointSet> cluster_centers_ [protected, inherited]

vector<int> membership_ [protected, inherited]

bool done_ [protected, inherited]


The documentation for this class was generated from the following files:
Generated on Fri Sep 21 11:39:05 2007 for libpmk2 by  doxygen 1.5.1