#include <hierarchical-clusterer.h>
This clusterer does not use the Clusterer interface because its output involves multiple levels. However, similar to Clusterer, HierarchicalClusterer has the ability to get its data either by actually doing hierarchical K-means, or by reading already-processed data. The difference is that the returned cluster centers are a PointSetList, where each element in the list gives the centers for a particular level.
Public Member Functions | |
HierarchicalClusterer () | |
const PointSetList & | GetClusterCenters () const |
Get the cluster centers. | |
const vector< LargeIndex > & | GetMemberships () const |
Get the membership data. | |
void | Cluster (int num_levels, int branch_factor, const vector< PointRef > &points, const DistanceComputer &distance_computer) |
Performs the actual clustering and stores the result internally. | |
pair< int, int > | GetChildRange (int level, int index) const |
Get the start and end indices of children at the next level. | |
int | GetParentIndex (int level, int index) const |
Get the index of the parent of a node. | |
void | WriteToStream (ostream &output_stream) const |
Write the clustered data to a stream. | |
void | WriteToFile (const char *output_filename) const |
Write the clustered data to a file. | |
void | ReadFromStream (istream &input_stream) |
Read clustered data from a stream. | |
void | ReadFromFile (const char *input_filename) |
Read the clustered data from a file. | |
Static Public Attributes | |
static const int | MAX_ITERATIONS |
Default 100. |
const PointSetList & GetClusterCenters | ( | ) | const |
Get the cluster centers.
Must be called after Cluster() or ReadFromStream(). Returns an ordered list of point sets. The ith element in this list corresponds to the centers at the ith level, where i=0 is the root (which is one giant cluster).
Note that this function does NOT tell you the hierarchical relationships. You will not know which clusters are subclusters of which ones. GetChildRange() and GetParentIndex() provide this functionality.
const vector< LargeIndex > & GetMemberships | ( | ) | const |
Get the membership data.
Must be called after Cluster() or ReadFromStream(). The size of this returned vector, of course, is the total number of points that were clustered. Each element in this vector is a LargeIndex describing a path down a tree. Membership of a single point is a LargeIndex, as opposed to an int which is used by flat (non-hierarchical) clustering methods. Naturally, the first element of all of the LargeIndices here will be 0, since the top level is one giant cluster. Following that, the cluster center can be retrieved (see GetClusterCenters()) by looking at each element in the LargeIndex.
For example, let C be the PointSetList returned by GetClusterCenters(). Suppose we have a LargeIndex [0 3 9]. C[0][0] is the root cluster. Then C[1][3] is the center that this point belonged to at the first level, and C[2][9] is the center that this point belonged to at the bottom level.
Note that this function does NOT tell you the hierarchical relationships. You will not know which clusters are subclusters of which ones. GetChildRange() and GetParentIndex() provide this functionality.
void Cluster | ( | int | num_levels, | |
int | branch_factor, | |||
const vector< PointRef > & | points, | |||
const DistanceComputer & | distance_computer | |||
) |
Performs the actual clustering and stores the result internally.
pair< int, int > GetChildRange | ( | int | level, | |
int | index | |||
) | const |
Get the start and end indices of children at the next level.
Requires Preprocess() or ReadFromStream() to be called first. The <level> and <index> parameters specify a cluster center. The two returned integers are the start and end indices into the cluster centers (returned by GetClusterCenters()) of that cluster center's children at the (<level>+1)st level. The returned pair is (-1, -1) if it has no children.
int GetParentIndex | ( | int | level, | |
int | index | |||
) | const |
Get the index of the parent of a node.
Requires Preprocess() or ReadFromStream() to be called first. The <level> and <index> parameters specify a cluster center. The returned integer is the index into the cluster centers (returned by GetClusterCenters()) of that cluster center's parent at the (<level>-1)st level. Returns -1 for the root node (the only one without a parent).
void WriteToStream | ( | ostream & | output_stream | ) | const |
Write the clustered data to a stream.
Must be called after Cluster() or ReadFromStream(). File format:
This function will abort if the stream is bad.
void WriteToFile | ( | const char * | output_filename | ) | const |
Write the clustered data to a file.
void ReadFromStream | ( | istream & | input_stream | ) |
Read clustered data from a stream.
Can be called in lieu of Cluster(). See WriteToStream() for the format. This function will abort if the stream is bad.
void ReadFromFile | ( | const char * | input_filename | ) |
Read the clustered data from a file.
const int MAX_ITERATIONS [static] |
Default 100.