GraphPad Prism 10 Statistics Guide - Results of hierarchical clustering

Zoom Window Out
Larger Text | Smaller Text
Hide Page Header
Show Expanding Text
Printable Version
Save Permalink URL

Navigation: STATISTICS WITH PRISM 10 > Clustering > Hierarchical clustering

Results of hierarchical clustering

Features and functionality described on this page are available with Prism Enterprise.

The results generated by hierarchical clustering are predominantly graphical (dendrograms and heatmaps). However, there are numerous optional tables (tabs) of results that this analysis can generate. The options to include these results are on the Output tab of the analysis parameters dialog, and include:

•Standardized/centered data

•Distance matrix

•Merge map

•Cophenetic distance matrix

•Cophenetic correlation coefficient

Standardized/centered data

If you choose to perform the hierarchical clustering analysis on centered or standardized data (dropdown menu on the Options tab of the dialog), you may also choose to include the resulting scaled data as an optional tab of the results.

Distance matrix (rows and/or columns)

Depending on the “direction” that you chose to perform clustering analysis (on rows, columns, or both), you may have one or two tabs of results: “Distance matrix (rows)” and/or “Distance matrix (columns)”. These two tabs are similar in structure, and indicate the calculated distance between each observation in the input data and every other observation, using the distance method defined on the Options tab of the analysis parameters dialog. Note that the distance matrix for clustering on columns will use the column titles from the input data as both the column and row titles for the matrix. The distance matrix for clustering on rows will use the row labels identified on the Data tab of the analysis parameters dialog as both the column and row titles for the matrix.

Cophenetic distances (rows and/or columns)

Depending on the “direction” that you chose to perform clustering analysis (on rows, columns, or both), you may have one or two tabs of results: “Cophenetic distances (rows)” and/or “Cophenetic distances (columns)”. These two tabs are similar in structure, and indicate the cophenetic distance between each observation in the input data and every other observation. The cophenetic distance between two observations is the calculated distance between the two largest clusters that contain these observations individually just before they are merged together into a single cluster. Looking at a dendrogram, the cophenetic distance between any two observations is the “height” on the dendrogram at which these two observations are first joined into a single cluster. Note that these distance values depend not only on the selected distance method, but also on the linkage method selected on the Options tab of the analysis parameters dialog.

Cophenetic correlation coefficient

This optional result is included on the Tabular results tab of the output of hierarchical clustering. This is a single value summary that provides an assessment of “how well” the dendrogram created by the clustering analysis preserves the true distances between each object in the original data. The value itself is calculated by determining the correlation between the values of the distance matrix and the cophenetic distance matrix. As this value is a correlation coefficient, its interpretation is similar to other correlation coefficients you may have encountered: this value will be between -1 and 1, with 1 indicating a perfect correlation, -1 indicating a perfectly negative correlation, and 0 representing no correlation whatsoever.

•A high cophenetic correlation coefficient (close to one) suggests that the clustering structure determined by the analysis is a good representation of the data and the relationships therein

•A low cophenetic correlation coefficient (close to zero) indicates that the clustering structure determined by the analysis poorly represents the actual data, and that the clusters or groups identified may not be meaningful

•A negative cophenetic correlation coefficient is extremely uncommon, and suggests that there may be a fundamental issue with the data or the clustering process.

Please enable JavaScript to view this site.

Standardized/centered data

Distance matrix (rows and/or columns)

Cophenetic distances (rows and/or columns)

Cophenetic correlation coefficient