Features and functionality described on this page are available with Prism Enterprise. |
The Davies-Bouldin (DB) index is a clustering validation metric that evaluates the quality of clustering by measuring the average similarity between clusters, where similarity is defined as the ratio of within-cluster scatter to between-cluster separation. Developed by Davies and Bouldin in 1979, this index is designed to identify compact, well-separated clusters.
The Davies-Bouldin index is based on the principle that good clustering should minimize within-cluster scatter while maximizing between-cluster separation. Lower DB values indicate better clustering, as they represent clusters that are more compact and better separated from each other.
The Davies-Bouldin index is calculated as follows:
where:
•k is the number of clusters
•δi is the within-cluster scatter for cluster i
•dij is the distance between cluster centroids i and j
The within-cluster scatter for cluster i is calculated as:
where:
•ni is the number of points in cluster i
•Ci is the set of points in cluster i
•cij is the j-th component of the centroid of cluster i
•u is a parameter (typically u = 2, giving the standard deviation)
The distance between cluster centroids i and j is calculated as:
where:
•v is a parameter (typically v = 2, giving Euclidean distance)
•cil and cjl are the l-th components of centroids i and j respectively
The Davies-Bouldin index measures the average "similarity" between clusters, where similarity is the ratio of within-cluster scatter to between-cluster separation:
•Lower DB values: Indicate better clustering with compact, well-separated clusters
•Higher DB values: Suggest that clusters are either too spread out internally or too close to each other
The optimal number of clusters corresponds to the minimum value of the Davies-Bouldin index. This occurs when clusters are internally compact (small δi values) and well-separated from each other (large dij values).
The Davies-Bouldin index offers several advantages:
•It's computationally efficient and doesn't require additional simulations
•The interpretation is straightforward (lower is better)
•It considers both compactness and separation simultaneously
•No assumptions about cluster distribution are required
However, there are some limitations:
•It may favor spherical clusters due to its reliance on centroids
•Performance can be affected by the choice of distance metric
•It may not work well with clusters of very different sizes or densities
•The index can be sensitive to outliers
The Davies-Bouldin index is particularly effective when clusters are roughly spherical and similar in size, making it a popular choice for evaluating k-means clustering results.
The Davies-Bouldin index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.