Features and functionality described on this page are available with Prism Enterprise. |
The Dunn index is a clustering validation metric that evaluates clustering quality by measuring the ratio between the minimal inter-cluster distance and the maximal intra-cluster distance. Developed by Dunn in 1974, this index is designed to identify clusters that are compact and well-separated.
The fundamental principle behind the Dunn index is that good clustering should have large distances between clusters and small distances within clusters. The index captures this by taking the ratio of the smallest distance between different clusters to the largest distance within any cluster.
The Dunn index is calculated as follows:
where:
•k is the number of clusters
•d(Ci, Cj) is the distance between clusters i and j
•Δ(Cl) is the diameter of cluster l
The distance between two clusters Ci and Cj is defined as the minimum distance between any two points from different clusters:
This represents the closest distance between any pair of points from the two clusters.
The diameter of a cluster C is defined as the maximum distance between any two points within the cluster:
This represents the largest distance between any pair of points within the cluster, measuring the cluster's internal spread.
The Dunn index evaluates clustering quality through the relationship between inter-cluster separation and intra-cluster compactness:
•Higher Dunn values: Indicate better clustering with compact clusters that are well-separated from each other
•Lower Dunn values: Suggest that clusters are either internally spread out or too close to neighboring clusters
The optimal number of clusters corresponds to the maximum value of the Dunn index. This occurs when:
•The minimum inter-cluster distance is large (clusters are well-separated)
•The maximum intra-cluster distance is small (clusters are compact)
The Dunn index offers several advantages:
•It has a clear geometric interpretation
•It doesn't make assumptions about cluster shape or size
•It's suitable for clusters of different shapes and densities
•The calculation is relatively straightforward
However, there are some limitations:
•It can be computationally expensive for large datasets due to pairwise distance calculations
•It's sensitive to outliers, as a single outlier can dramatically affect the cluster diameter
•It may not perform well when clusters have very different densities
•The index can be dominated by the worst-case distances (minimum between-cluster, maximum within-cluster)
The Dunn index is particularly useful when you want to ensure that clusters are both internally cohesive and externally well-separated, making it valuable for applications where clear cluster boundaries are important.
The Dunn index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.