GraphPad Prism 10 Statistics Guide

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Printable Version
Save Permalink URL

Navigation: STATISTICS WITH PRISM 10 > Clustering > The primary concepts of clustering > Selecting the optimal number of clusters

Frey Index

Scroll Prev Top Next More

Features and functionality described on this page are available with Prism Enterprise.

The Frey index is a clustering validation metric proposed by Frey and Van Groenewoud in 1972 when they introduced their k-method of clustering. This index was originally designed for hierarchical clustering methods and evaluates the optimal number of clusters by examining the ratio of differences between successive levels in the clustering hierarchy. However, it can also be applied to K-means clustering by iteratively comparing solutions with k and k+1 clusters.

The Frey index is based on the principle that the optimal number of clusters corresponds to a natural breaking point in the clustering process, identified by examining changes in distance relationships between different cluster solutions.

Mathematical calculation

The Frey index is calculated as follows:

where:

•S̄b,j is the mean between-cluster distance for j clusters

•S̄w,j is the mean within-cluster distance for j clusters

•j represents the number of clusters in the current solution

Mean distance components

The mean distances are calculated as:

where:

•Sb is the sum of between-cluster distances

•Sw is the sum of within-cluster distances

•Nb is the number of between-cluster pairs

•Nw is the number of within-cluster pairs

Application to clustering methods

Hierarchical clustering

In hierarchical clustering, the index compares two consecutive levels in the clustering hierarchy:

•Level j: Current clustering solution with j clusters

•Level j+1: Next clustering solution with j+1 clusters

The ratio measures how the change in between-cluster distances compares to the change in within-cluster distances when moving between these levels.

K-means clustering

The Frey index can also be applied to K-means clustering by:

•Starting with k=1 (or k=2) clusters

•Iteratively increasing the number of clusters

•Computing the Frey index for each comparison between k and k+1 cluster solutions

Important caveat for K-means: Unlike hierarchical clustering, there is no assurance that two points grouped together in the k-cluster solution will remain in the same cluster in the k+1-cluster solution. This is because K-means uses a partitional approach where cluster assignments can change completely between different k values. Despite this limitation, the Frey index can still provide useful information for determining optimal cluster numbers in K-means, and this approach is implemented in packages such as NbClust.

Interpretation

The Frey index evaluates the relative changes in distance relationships:

•Frey ratio ≈ 1.00: Indicates a potential optimal clustering level

•Frey ratio > 1.00: Suggests that between-cluster distances are changing more rapidly than within-cluster distances

•Frey ratio < 1.00: Indicates that within-cluster distances are changing more rapidly

Decision rule

The original decision rule proposed by Frey and Van Groenewoud:

1. Continue clustering until the ratio falls below 1.00

2. The optimal clustering level is the one before the ratio drops below 1.00

3. If the ratio never falls below 1.00, assume a single cluster solution

The Frey index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

Please enable JavaScript to view this site.