Features and functionality described on this page are available with Prism Enterprise. |
The Frey index is a clustering validation metric proposed by Frey and Van Groenewoud in 1972 when they introduced their k-method of clustering. This index was originally designed for hierarchical clustering methods and evaluates the optimal number of clusters by examining the ratio of differences between successive levels in the clustering hierarchy. However, it can also be applied to K-means clustering by iteratively comparing solutions with k and k+1 clusters.
The Frey index is based on the principle that the optimal number of clusters corresponds to a natural breaking point in the clustering process, identified by examining changes in distance relationships between different cluster solutions.
The Frey index is calculated as follows:
where:
•S̄b,j is the mean between-cluster distance for j clusters
•S̄w,j is the mean within-cluster distance for j clusters
•j represents the number of clusters in the current solution
The mean distances are calculated as:
where:
•Sb is the sum of between-cluster distances
•Sw is the sum of within-cluster distances
•Nb is the number of between-cluster pairs
•Nw is the number of within-cluster pairs
In hierarchical clustering, the index compares two consecutive levels in the clustering hierarchy:
•Level j: Current clustering solution with j clusters
•Level j+1: Next clustering solution with j+1 clusters
The ratio measures how the change in between-cluster distances compares to the change in within-cluster distances when moving between these levels.
The Frey index can also be applied to K-means clustering by:
•Starting with k=1 (or k=2) clusters
•Iteratively increasing the number of clusters
•Computing the Frey index for each comparison between k and k+1 cluster solutions
Important caveat for K-means: Unlike hierarchical clustering, there is no assurance that two points grouped together in the k-cluster solution will remain in the same cluster in the k+1-cluster solution. This is because K-means uses a partitional approach where cluster assignments can change completely between different k values. Despite this limitation, the Frey index can still provide useful information for determining optimal cluster numbers in K-means, and this approach is implemented in packages such as NbClust.
The Frey index evaluates the relative changes in distance relationships:
•Frey ratio ≈ 1.00: Indicates a potential optimal clustering level
•Frey ratio > 1.00: Suggests that between-cluster distances are changing more rapidly than within-cluster distances
•Frey ratio < 1.00: Indicates that within-cluster distances are changing more rapidly
The original decision rule proposed by Frey and Van Groenewoud:
1. Continue clustering until the ratio falls below 1.00
2. The optimal clustering level is the one before the ratio drops below 1.00
3. If the ratio never falls below 1.00, assume a single cluster solution
The Frey index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.