HierarchicalClusterer Class Reference

#include <hierarchical-clusterer.h>

List of all members.


Detailed Description

Hierarchical K-Means clusterer.

This clusterer does not use the Clusterer interface because its output involves multiple levels. However, similar to Clusterer, HierarchicalClusterer has the ability to get its data either by actually doing hierarchical K-means, or by reading already-processed data. The difference is that the returned cluster centers are a PointSetList, where each element in the list gives the centers for a particular level.


Public Member Functions

 HierarchicalClusterer ()
const PointSetListGetClusterCenters () const
 Get the cluster centers.
const vector< LargeIndex > & GetMemberships () const
 Get the membership data.
void Cluster (int num_levels, int branch_factor, const vector< PointRef > &points, const DistanceComputer &distance_computer)
 Performs the actual clustering and stores the result internally.
pair< int, int > GetChildRange (int level, int index) const
 Get the start and end indices of children at the next level.
int GetParentIndex (int level, int index) const
 Get the index of the parent of a node.
void WriteToStream (ostream &output_stream) const
 Write the clustered data to a stream.
void WriteToFile (const char *output_filename) const
 Write the clustered data to a file.
void ReadFromStream (istream &input_stream)
 Read clustered data from a stream.
void ReadFromFile (const char *input_filename)
 Read the clustered data from a file.

Static Public Attributes

static const int MAX_ITERATIONS
 Default 100.


Constructor & Destructor Documentation

HierarchicalClusterer (  ) 


Member Function Documentation

const PointSetList & GetClusterCenters (  )  const

Get the cluster centers.

Must be called after Cluster() or ReadFromStream(). Returns an ordered list of point sets. The ith element in this list corresponds to the centers at the ith level, where i=0 is the root (which is one giant cluster).

Note that this function does NOT tell you the hierarchical relationships. You will not know which clusters are subclusters of which ones. GetChildRange() and GetParentIndex() provide this functionality.

const vector< LargeIndex > & GetMemberships (  )  const

Get the membership data.

Must be called after Cluster() or ReadFromStream(). The size of this returned vector, of course, is the total number of points that were clustered. Each element in this vector is a LargeIndex describing a path down a tree. Membership of a single point is a LargeIndex, as opposed to an int which is used by flat (non-hierarchical) clustering methods. Naturally, the first element of all of the LargeIndices here will be 0, since the top level is one giant cluster. Following that, the cluster center can be retrieved (see GetClusterCenters()) by looking at each element in the LargeIndex.

For example, let C be the PointSetList returned by GetClusterCenters(). Suppose we have a LargeIndex [0 3 9]. C[0][0] is the root cluster. Then C[1][3] is the center that this point belonged to at the first level, and C[2][9] is the center that this point belonged to at the bottom level.

Note that this function does NOT tell you the hierarchical relationships. You will not know which clusters are subclusters of which ones. GetChildRange() and GetParentIndex() provide this functionality.

void Cluster ( int  num_levels,
int  branch_factor,
const vector< PointRef > &  points,
const DistanceComputer distance_computer 
)

Performs the actual clustering and stores the result internally.

pair< int, int > GetChildRange ( int  level,
int  index 
) const

Get the start and end indices of children at the next level.

Requires Preprocess() or ReadFromStream() to be called first. The <level> and <index> parameters specify a cluster center. The two returned integers are the start and end indices into the cluster centers (returned by GetClusterCenters()) of that cluster center's children at the (<level>+1)st level. The returned pair is (-1, -1) if it has no children.

int GetParentIndex ( int  level,
int  index 
) const

Get the index of the parent of a node.

Requires Preprocess() or ReadFromStream() to be called first. The <level> and <index> parameters specify a cluster center. The returned integer is the index into the cluster centers (returned by GetClusterCenters()) of that cluster center's parent at the (<level>-1)st level. Returns -1 for the root node (the only one without a parent).

void WriteToStream ( ostream &  output_stream  )  const

Write the clustered data to a stream.

Must be called after Cluster() or ReadFromStream(). File format:

This function will abort if the stream is bad.

void WriteToFile ( const char *  output_filename  )  const

Write the clustered data to a file.

void ReadFromStream ( istream &  input_stream  ) 

Read clustered data from a stream.

Can be called in lieu of Cluster(). See WriteToStream() for the format. This function will abort if the stream is bad.

See also:
WriteToStream()

void ReadFromFile ( const char *  input_filename  ) 

Read the clustered data from a file.


Member Data Documentation

const int MAX_ITERATIONS [static]

Default 100.


The documentation for this class was generated from the following files:
Generated on Wed May 2 11:17:13 2007 for libpmk by  doxygen 1.5.1