Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

The Davies-Bouldin (DB) index is a clustering validation metric that evaluates the quality of clustering by measuring the average similarity between clusters, where similarity is defined as the ratio of within-cluster scatter to between-cluster separation. Developed by Davies and Bouldin in 1979, this index is designed to identify compact, well-separated clusters.

The Davies-Bouldin index is based on the principle that good clustering should minimize within-cluster scatter while maximizing between-cluster separation. Lower DB values indicate better clustering, as they represent clusters that are more compact and better separated from each other.

Mathematical calculation

The Davies-Bouldin index is calculated as follows:

where:

k is the number of clusters

δi is the within-cluster scatter for cluster i

dij is the distance between cluster centroids i and j

Within-cluster scatter (δi)

The within-cluster scatter for cluster i is calculated as:

where:

ni is the number of points in cluster i

Ci is the set of points in cluster i

cij is the j-th component of the centroid of cluster i

u is a parameter (typically u = 2, giving the standard deviation)

Between-cluster distance (dij)

The distance between cluster centroids i and j is calculated as:

where:

v is a parameter (typically v = 2, giving Euclidean distance)

cil and cjl are the l-th components of centroids i and j respectively

Interpretation

The Davies-Bouldin index measures the average "similarity" between clusters, where similarity is the ratio of within-cluster scatter to between-cluster separation:

Lower DB values: Indicate better clustering with compact, well-separated clusters

Higher DB values: Suggest that clusters are either too spread out internally or too close to each other

The optimal number of clusters corresponds to the minimum value of the Davies-Bouldin index. This occurs when clusters are internally compact (small δi values) and well-separated from each other (large dij values).

Advantages and considerations

The Davies-Bouldin index offers several advantages:

It's computationally efficient and doesn't require additional simulations

The interpretation is straightforward (lower is better)

It considers both compactness and separation simultaneously

No assumptions about cluster distribution are required

However, there are some limitations:

It may favor spherical clusters due to its reliance on centroids

Performance can be affected by the choice of distance metric

It may not work well with clusters of very different sizes or densities

The index can be sensitive to outliers

The Davies-Bouldin index is particularly effective when clusters are roughly spherical and similar in size, making it a popular choice for evaluating k-means clustering results.

The Davies-Bouldin index is one of 17 methods used in Prism's consensus approach for determining optimal cluster numbers, as described on the cluster metrics page.

© 1995-2019 GraphPad Software, LLC. All rights reserved.