Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

The Krzanowski-Lai (KL) index is a clustering validation metric proposed by Krzanowski and Lai in 1988 that evaluates the optimal number of clusters by comparing the relative improvement in clustering quality across different numbers of clusters. This index is designed to identify significant changes in within-cluster sum of squares when the number of clusters varies.

The KL index is based on the principle that the optimal number of clusters corresponds to the point where there is a significant change in the rate of improvement of the clustering objective function.

Mathematical calculation

The Krzanowski-Lai index is calculated as follows:

where DIFFk is defined as:

where:

k is the number of clusters

p is the number of variables (dimensions)

WCSSk is the within-cluster sum of squares for k clusters

Within-cluster sum of squares

The within-cluster sum of squares represents the total within-cluster variance:

where:

Ci is the i-th cluster

ci is the centroid of cluster i

Scaling factor

The term (k-1)2/p and k2/p provide dimension-adjusted scaling factors that account for:

The number of clusters

The dimensionality of the data space

This scaling helps normalize the comparison across different numbers of clusters and dimensions.

Interpretation

The KL index measures the relative change in clustering improvement:

Higher KL values: Indicate a significant change in the rate of improvement between k and k+1 clusters

Lower KL values: Suggest that the improvement rate is similar between consecutive cluster numbers

The optimal number of clusters corresponds to the maximum value of the KL index. This occurs when there is the largest relative change in the improvement rate, suggesting a natural breaking point in the clustering structure.

Advantages and considerations

The Krzanowski-Lai index offers several advantages:

It accounts for the dimensionality of the data through the k2/p scaling

It provides a relative measure of improvement rather than absolute values

It's designed to detect significant changes in clustering quality

It works well with methods that optimize within-cluster sum of squares

However, there are some limitations:

It requires calculation of clustering solutions for multiple values of k

The interpretation can be less intuitive than some other indices

It may be sensitive to the choice of clustering algorithm

Performance can vary depending on the underlying cluster structure

The KL index is particularly effective when:

The data has a clear hierarchical or nested cluster structure

You want to detect significant changes in clustering improvement

Working with high-dimensional data where the scaling factor becomes important

Using clustering methods that minimize within-cluster variance (like k-means)

The index is most reliable when used in conjunction with other validation methods to confirm the optimal number of clusters.

The Krzanowski-Lai index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

© 1995-2019 GraphPad Software, LLC. All rights reserved.