Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

The TraceW index is one of the most popular clustering validation metrics suggested for use in clustering contexts, as noted by Milligan and Cooper (1985). This index is also referenced by Edwards and Cavalli-Sforza (1965), Friedman and Rubin (1967), Orloci (1967), and Fukunaga and Koontz (1970). The TraceW index measures clustering quality through the within-cluster sum of squares (WCSS).

The TraceW index is based on the fundamental principle that good clustering should minimize the within-cluster variance. It directly measures the total within-cluster sum of squares across all variables and clusters.

Mathematical calculation

The TraceW index is calculated as follows:

where WCSSk is the within-cluster sum of squares for k clusters.

Within-cluster sum of squares

The within-cluster sum of squares is defined as:

where:

k is the number of clusters

Ci is the i-th cluster

ci is the centroid of cluster i

x is a data point in cluster i

||x - ci||2 is the squared Euclidean distance from point x to centroid ci

This represents the sum of squared distances from each point to its cluster centroid, across all clusters.

Interpretation

The TraceW index directly measures the total within-cluster variance:

Lower TraceW values: Indicate better clustering with more compact clusters (smaller within-cluster variance)

Higher TraceW values: Suggest less compact clusters with larger within-cluster variance

Since the TraceW criterion increases monotonically as the number of clusters decreases, the optimal number of clusters is determined by finding the maximum of the second differences between consecutive TraceW values. This identifies the point where the rate of improvement in within-cluster variance begins to diminish significantly.

Second differences approach

The decision rule involves:

1. Calculate TraceW for k = 2, 3, 4, ... clusters

2. Compute first differences: Δ1(k) = TraceW(k-1) - TraceW(k)

3. Compute second differences: Δ2(k) = Δ1(k) - Δ1(k+1)

4. Select k corresponding to max(Δ2(k))

Advantages and considerations

The TraceW index offers several advantages:

It's computationally simple and efficient

It directly measures the primary objective of many clustering algorithms

It's identical to the Within-Cluster Sum of Squares (WCSS) used in the elbow method

The interpretation is straightforward (total within-cluster variance)

However, there are some limitations:

It always decreases as the number of clusters increases, requiring second differences for interpretation

It may favor spherical clusters due to its reliance on centroids

It doesn't directly account for between-cluster separation

The second differences approach can be sensitive to local fluctuations

Relationship to other methods

The TraceW index is equivalent to:

Within-Cluster Sum of Squares (WCSS) in the elbow method

SSW (Sum of Squares Within) in ANOVA-based clustering evaluation

The objective function minimized by k-means clustering

This makes it particularly relevant when using k-means or other centroid-based clustering algorithms.

Usage recommendations

The TraceW index is particularly effective when:

Using k-means or other centroid-based clustering methods

Clusters are expected to be roughly spherical and similar in size

You want a simple, computationally efficient validation measure

Working with the second differences approach to identify optimal cluster numbers

The index provides a direct measure of the clustering objective that many algorithms optimize, making it a natural choice for validation in those contexts.

The TraceW index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

© 1995-2019 GraphPad Software, LLC. All rights reserved.