Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

The McClain-Rao index is a clustering validation metric proposed by McClain and Rao in 1975 that evaluates clustering quality by comparing the average within-cluster distance to the average between-cluster distance. This index provides a direct ratio-based measure of how well separated the clusters are relative to their internal cohesion.

The McClain-Rao index is based on the fundamental principle that good clustering should have small within-cluster distances (indicating compact clusters) and large between-cluster distances (indicating well-separated clusters).

Mathematical calculation

The McClain-Rao index is calculated as follows:

where:

Sw is the sum of within-cluster distances

Sb is the sum of between-cluster distances

Nw is the total number of within-cluster pairs

Nb is the total number of between-cluster pairs

Within-cluster component

The within-cluster sum of distances and number of pairs:

where:

k is the number of clusters

Ci is the i-th cluster

ni is the number of points in cluster i

d(x,y) is the distance between points x and y

Between-cluster component

The between-cluster sum of distances and number of pairs:

Average distances

The McClain-Rao index represents the ratio of:

w = Sw/Nw: Average within-cluster distance

b = Sb/Nb: Average between-cluster distance

Interpretation

The McClain-Rao index measures the relative magnitudes of within-cluster and between-cluster distances:

Lower McClain values: Indicate better clustering with small within-cluster distances relative to between-cluster distances

Higher McClain values: Suggest poor clustering where within-cluster distances are large compared to between-cluster distances

The optimal number of clusters corresponds to the minimum value of the McClain-Rao index. This occurs when:

Within-cluster distances are minimized (clusters are compact)

Between-cluster distances are maximized (clusters are well-separated)

The ratio between them is smallest

Advantages and considerations

The McClain-Rao index offers several advantages:

It provides an intuitive ratio-based interpretation

It directly compares the two key aspects of clustering quality

It doesn't make assumptions about cluster shape or distribution

The calculation is relatively straightforward

However, there are some limitations:

It requires calculation of all pairwise distances, which can be computationally expensive

It may be sensitive to outliers that create large distances

It doesn't account for cluster size differences

Performance can vary with the choice of distance metric

The McClain-Rao index is particularly useful when:

You want a direct comparison of internal cohesion vs. external separation

The clusters are expected to be relatively compact

You need an easily interpretable clustering validation measure

Working with datasets where the distance metric has clear meaning

The index works well across different clustering algorithms and is especially effective when combined with other validation methods to provide a comprehensive assessment of clustering quality.

The McClain-Rao index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

© 1995-2019 GraphPad Software, LLC. All rights reserved.