Please enable JavaScript to view this site.

Features and functionality described on this page are available with Prism Enterprise.

When the "Determine optimal number of clusters" option is selected in the K-means clustering analysis parameters dialog, Prism evaluates multiple clustering solutions using 17 different statistical methods to identify the optimal number of clusters. Each method applies its own metric to assess the quality of clustering solutions, and this tab presents the comprehensive results of these evaluations.

This tab contains a grid showing the metric values calculated by each of the 17 statistical methods for every tested number of clusters. The columns of this table represent the different evaluation methods, while the rows represent each number of clusters that was tested (based on the minimum and maximum values you specified in the analysis options).

Understanding the cluster evaluation methods

The 17 methods used to evaluate clustering quality can be broadly categorized into several types of metrics:

Internal validation metrics assess cluster quality based solely on the data and clustering results, without external reference. These include methods like Calinski-Harabasz (which measures the ratio of between-cluster to within-cluster variance), Davies-Bouldin (which evaluates cluster separation and compactness), and Silhouette (which measures how similar each point is to its own cluster compared to other clusters).

Gap statistic approaches compare the clustering structure of your data to what would be expected from random data, helping identify meaningful clustering patterns.

Geometric and distance-based methods evaluate clustering based on the spatial relationships between points and cluster centers, including metrics like Ball-Hall, Dunn, and Trace W.

Information-theoretic approaches use statistical measures to assess the information content and structure revealed by different clustering solutions.

Interpreting the results

Each cell in the grid shows the metric value calculated by that method for that specific number of clusters. Different methods use different scales and optimization directions - some methods are optimized when their values are maximized, while others are optimized when minimized.

For each method, Prism identifies the number of clusters that produces the "best" metric value according to that method's optimization criteria. The optimal cluster count recommendation shown in the tabular results represents the consensus across all methods - the number of clusters most frequently identified as optimal by the individual methods.

The details and the math

Information about each method - including any mathematical formulas used, the interpretation of the calculated values, and the strengths and weaknesses of the method - can be found in a separate section of this guide.

How can this information be used?

This detailed breakdown allows you to:

Examine consensus strength: If most methods agree on the same optimal cluster number, this suggests a strong consensus. If methods are split between two or three different cluster numbers, the optimal choice may be less clear-cut.

Understand method sensitivity: Some methods may be more sensitive to your particular data structure than others, which can help inform your decision if you choose to override the consensus recommendation.

Validate clustering decisions: When presenting clustering results, this comprehensive evaluation may provide some statistical justification for your chosen number of clusters.

The cluster assignments for each evaluated number of clusters can be found on the Clustered data tab of results, while detailed information about the cluster centers can be found on the Clusters details tab of the results.

© 1995-2019 GraphPad Software, LLC. All rights reserved.