GraphPad Prism 10 Statistics Guide

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Printable Version
Save Permalink URL

Navigation: STATISTICS WITH PRISM 10 > Clustering > The primary concepts of clustering > Selecting the optimal number of clusters

Ball-Hall Index

Scroll Prev Top Next More

Features and functionality described on this page are available with Prism Enterprise.

The Ball-Hall index is a clustering validation metric proposed by Ball and Hall in 1965 that evaluates clustering quality by measuring the average within-cluster sum of squares per cluster. This index provides a normalized measure of cluster compactness that accounts for the number of clusters in the solution.

The Ball-Hall index is based on the principle that good clustering should have compact clusters with small within-cluster variance. By averaging the within-cluster sum of squares across all clusters, it provides a per-cluster measure of compactness.

Mathematical calculation

The Ball-Hall index is calculated as follows:

where:

•WCSSk is the total within-cluster sum of squares for k clusters

•k is the number of clusters

Within-cluster sum of squares

The total within-cluster sum of squares is defined as:

where:

•Ci is the i-th cluster

•ci is the centroid of cluster i

•||x - ci||2 is the squared Euclidean distance from point x to centroid ci

Average interpretation

The Ball-Hall index represents the average within-cluster sum of squares per cluster:

This gives the mean compactness across all clusters in the solution.

Interpretation

The Ball-Hall index measures the average cluster compactness:

•Lower Ball values: Indicate better clustering with more compact clusters on average

•Higher Ball values: Suggest less compact clusters with larger average within-cluster variance

The optimal number of clusters is determined by finding the largest difference between levels of the Ball-Hall index. This corresponds to the point where adding or removing a cluster provides the most significant change in average cluster compactness.

Decision approach

The evaluation process involves:

1. Calculate Ball-Hall index for different numbers of clusters (k = 2, 3, 4, ...)

2. Compute differences between consecutive values: Δ(k) = |Ball(k) - Ball(k+1)|

3. Select k corresponding to max(Δ(k))

This identifies the number of clusters where the change in average compactness is most pronounced.

Advantages and considerations

The Ball-Hall index offers several advantages:

•It provides a normalized measure that accounts for the number of clusters

•The interpretation is intuitive (average cluster compactness)

•It's computationally efficient to calculate

•It directly relates to the objective function of many clustering algorithms

However, there are some limitations:

•It may favor solutions with fewer clusters due to the division by k

•It assumes that all clusters should have similar compactness

•It may not work well with clusters of very different sizes or shapes

•It doesn't directly consider between-cluster separation

Relationship to other indices

The Ball-Hall index is closely related to:

•TraceW index: Ball = TraceW / k = WCSS / k

•Within-Cluster Sum of Squares (WCSS): Ball provides a normalized version

•Average within-cluster variance: It measures the same concept across clusters

Usage recommendations

The Ball-Hall index is particularly effective when:

•You want a normalized measure of cluster compactness

•Clusters are expected to have similar sizes and shapes

•Using k-means or other centroid-based clustering methods

•You need a simple, interpretable validation measure

The index is especially useful when combined with other validation methods that consider between-cluster separation, as the Ball-Hall index focuses primarily on within-cluster compactness.

It's also valuable for comparing clustering solutions with different numbers of clusters, as the normalization by k makes the values more directly comparable across different cluster configurations.

The Ball-Hall index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

Please enable JavaScript to view this site.