Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

The Gamma index is a clustering validation metric that represents an adaptation of Goodman and Kruskal's Gamma statistic for clustering evaluation. Developed by Baker and Hubert in 1975, this index measures the agreement between within-cluster and between-cluster dissimilarities to assess clustering quality.

The Gamma index is based on the principle that in good clustering, distances within clusters should be consistently smaller than distances between clusters. It evaluates this by comparing all within-cluster dissimilarities against all between-cluster dissimilarities.

Mathematical calculation

The Gamma index is calculated as follows:

where:

s(+) is the number of concordant comparisons

s(-) is the number of discordant comparisons

Concordant and discordant comparisons

The comparisons are made between all within-cluster dissimilarities and all between-cluster dissimilarities:

Concordant comparison s(+): A comparison where a within-cluster dissimilarity is strictly less than a between-cluster dissimilarity

Discordant comparison s(-): A comparison where a within-cluster dissimilarity is strictly greater than a between-cluster dissimilarity

Note that equal dissimilarities between the two sets are disregarded in the calculation of the index.

Computational procedure

1. Calculate all pairwise distances within each cluster (within-cluster dissimilarities)

2. Calculate all pairwise distances between points in different clusters (between-cluster dissimilarities)

3. For each within-cluster distance, compare it with each between-cluster distance:

If within-cluster distance < between-cluster distance → increment s(+)

If within-cluster distance > between-cluster distance → increment s(-)

If distances are equal → ignore this comparison

4. Calculate the Gamma index using the formula above

Interpretation

The Gamma index ranges from -1 to +1 and measures the quality of clustering:

Higher Gamma values (closer to +1): Indicate better clustering where within-cluster distances are consistently smaller than between-cluster distances

Lower Gamma values (closer to -1): Suggest poor clustering where within-cluster distances are often larger than between-cluster distances

Gamma values near 0: Indicate that within-cluster and between-cluster distances are similar on average

The optimal number of clusters corresponds to the maximum value of the Gamma index.

Advantages and considerations

The Gamma index offers several advantages:

It provides a comprehensive comparison of all distance relationships

It doesn't make assumptions about cluster shape or distribution

The interpretation is intuitive and well-founded statistically

It's based on established statistical theory (Goodman-Kruskal Gamma)

However, there are significant limitations:

Computationally very expensive: Requires comparison of all within-cluster distances with all between-cluster distances

High computational demand: The number of comparisons grows rapidly with dataset size

Time complexity: Can become prohibitive for large datasets

Memory requirements: May require substantial computational resources

Due to its high computational cost, the Gamma index is typically calculated only when specifically requested (e.g., using index = "gamma" or index = "alllong" in the NbClust package) and may not be suitable for large datasets without sufficient computational resources.

The Gamma index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

© 1995-2019 GraphPad Software, LLC. All rights reserved.