Features and functionality described on this page are available with Prism Enterprise. |
The TraceW index is one of the most popular clustering validation metrics suggested for use in clustering contexts, as noted by Milligan and Cooper (1985). This index is also referenced by Edwards and Cavalli-Sforza (1965), Friedman and Rubin (1967), Orloci (1967), and Fukunaga and Koontz (1970). The TraceW index measures clustering quality through the within-cluster sum of squares (WCSS).
The TraceW index is based on the fundamental principle that good clustering should minimize the within-cluster variance. It directly measures the total within-cluster sum of squares across all variables and clusters.
The TraceW index is calculated as follows:
where WCSSk is the within-cluster sum of squares for k clusters.
The within-cluster sum of squares is defined as:
where:
•k is the number of clusters
•Ci is the i-th cluster
•ci is the centroid of cluster i
•x is a data point in cluster i
•||x - ci||2 is the squared Euclidean distance from point x to centroid ci
This represents the sum of squared distances from each point to its cluster centroid, across all clusters.
The TraceW index directly measures the total within-cluster variance:
•Lower TraceW values: Indicate better clustering with more compact clusters (smaller within-cluster variance)
•Higher TraceW values: Suggest less compact clusters with larger within-cluster variance
Since the TraceW criterion increases monotonically as the number of clusters decreases, the optimal number of clusters is determined by finding the maximum of the second differences between consecutive TraceW values. This identifies the point where the rate of improvement in within-cluster variance begins to diminish significantly.
The decision rule involves:
1. Calculate TraceW for k = 2, 3, 4, ... clusters
2. Compute first differences: Δ1(k) = TraceW(k-1) - TraceW(k)
3. Compute second differences: Δ2(k) = Δ1(k) - Δ1(k+1)
4. Select k corresponding to max(Δ2(k))
The TraceW index offers several advantages:
•It's computationally simple and efficient
•It directly measures the primary objective of many clustering algorithms
•It's identical to the Within-Cluster Sum of Squares (WCSS) used in the elbow method
•The interpretation is straightforward (total within-cluster variance)
However, there are some limitations:
•It always decreases as the number of clusters increases, requiring second differences for interpretation
•It may favor spherical clusters due to its reliance on centroids
•It doesn't directly account for between-cluster separation
•The second differences approach can be sensitive to local fluctuations
The TraceW index is equivalent to:
•Within-Cluster Sum of Squares (WCSS) in the elbow method
•SSW (Sum of Squares Within) in ANOVA-based clustering evaluation
•The objective function minimized by k-means clustering
This makes it particularly relevant when using k-means or other centroid-based clustering algorithms.
The TraceW index is particularly effective when:
•Using k-means or other centroid-based clustering methods
•Clusters are expected to be roughly spherical and similar in size
•You want a simple, computationally efficient validation measure
•Working with the second differences approach to identify optimal cluster numbers
The index provides a direct measure of the clustering objective that many algorithms optimize, making it a natural choice for validation in those contexts.
The TraceW index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.