Features and functionality described on this page are available with Prism Enterprise. |
The results generated by hierarchical clustering are predominantly graphical (dendrograms and heatmaps). However, there are numerous optional tables (tabs) of results that this analysis can generate. The options to include these results are on the Output tab of the analysis parameters dialog, and include:
•Standardized/centered data
•Distance matrix
•Merge map
•Cophenetic distance matrix
•Cophenetic correlation coefficient
If you choose to perform the hierarchical clustering analysis on centered or standardized data (dropdown menu on the Options tab of the dialog), you may also choose to include the resulting scaled data as an optional tab of the results.
Depending on the “direction” that you chose to perform clustering analysis (on rows, columns, or both), you may have one or two tabs of results: “Distance matrix (rows)” and/or “Distance matrix (columns)”. These two tabs are similar in structure, and indicate the calculated distance between each observation in the input data and every other observation, using the distance method defined on the Options tab of the analysis parameters dialog. Note that the distance matrix for clustering on columns will use the column titles from the input data as both the column and row titles for the matrix. The distance matrix for clustering on rows will use the row labels identified on the Data tab of the analysis parameters dialog as both the column and row titles for the matrix.
Depending on the “direction” that you chose to perform clustering analysis (on rows, columns, or both), you may have one or two tabs of results: “Cophenetic distances (rows)” and/or “Cophenetic distances (columns)”. These two tabs are similar in structure, and indicate the cophenetic distance between each observation in the input data and every other observation. The cophenetic distance between two observations is the calculated distance between the two largest clusters that contain these observations individually just before they are merged together into a single cluster. Looking at a dendrogram, the cophenetic distance between any two observations is the “height” on the dendrogram at which these two observations are first joined into a single cluster. Note that these distance values depend not only on the selected distance method, but also on the linkage method selected on the Options tab of the analysis parameters dialog.
This optional result is included on the Tabular results tab of the output of hierarchical clustering. This is a single value summary that provides an assessment of “how well” the dendrogram created by the clustering analysis preserves the true distances between each object in the original data. The value itself is calculated by determining the correlation between the values of the distance matrix and the cophenetic distance matrix. As this value is a correlation coefficient, its interpretation is similar to other correlation coefficients you may have encountered: this value will be between -1 and 1, with 1 indicating a perfect correlation, -1 indicating a perfectly negative correlation, and 0 representing no correlation whatsoever.
•A high cophenetic correlation coefficient (close to one) suggests that the clustering structure determined by the analysis is a good representation of the data and the relationships therein
•A low cophenetic correlation coefficient (close to zero) indicates that the clustering structure determined by the analysis poorly represents the actual data, and that the clusters or groups identified may not be meaningful
•A negative cophenetic correlation coefficient is extremely uncommon, and suggests that there may be a fundamental issue with the data or the clustering process.