# Prism 9: Taking your analyses and graphs to higher dimensions

Prism 9.0.0 was released on October 27, 2020.

## Data on a higher dimension!

Prism 9 introduces a number of great improvements to the Multiple Variables data table. Explore larger data sets using a standard structure, and perform new and improved analyses with the following improvements:

**Increased data limits**- enter up to 1024 columns of data in each data table**Automatic identification of variable types**- Identify variables in the Multiple variables data table as continuous, categorical, or label values**Text information in the data table**- Enter data directly as text. Instead of coding a variable like "0" and "1", simply enter "Male" and "Female" directly in the data table**Automatic variable encoding**- Enter your data and let Prism take care of the rest. Prism will automatically encode categorical text variables into numeric "dummy" variables

Often times in research we find ourselves with an abundance of information on different variables from our experiments. As a simple example, imagine measuring the blood pressure of individuals after giving them either an experimental drug intended to reduce blood pressure or a placebo. It's likely that in addition to the recorded blood pressure measurements, you also recorded a wealth of information on each subject's age, height, weight, gender, race, and any number of other potential variables.

Numerous statistical techniques are designed to analyze this sort of "multiple variables" data, such as multiple linear regression and multiple logistic regression. Using these sorts of "multiple variables" analyses means you can explore the outcome of interest without wasting any potentially useful information. In order to facilitate this increased density of data information, Prism offers our Multiple variables data table to house data in a standard data structure that is used almost universally by other statistics software and packages out there (such as R, SPSS, and MATLAB). In this format, each column represents a different variable, while each row represents a different subject (measurements of each variable for each subject get placed into their appropriate column on that subject's row).

## Principal Component Analysis (PCA)

Sometimes, the amount of variables collected far outweighs the number of subjects that were available to study. Consider gene expression studies in which expression levels of hundreds or thousands of different genes were measured from subjects divided into two groups: a treatment group and a control group. There may simply be too many variables to fit a model to the data. But selecting some variables to exclude from the analysis is simply throwing information away that could be useful! PCA is a technique of "dimensionality reduction" that can be used to reduce the number of required variables while eliminating as little information from the data as possible.

Additional features available within PCA include:

- Component selection via Parallel Analysis (as well as the Kaiser method, threshold of total variance method, and others)
- Generation of Scree Plots, Score Plots, and Biplots
- Automatic preparation of PCA results for further use in multiple linear regression (Principal Component Regression)

## Adding new dimensions to your graphs

Create Bubble Plots directly from raw data, encoding variables for symbol position (X- and Y-coordinates), size, and fill color. Note that both symbol color and symbol size can be defined using a categorical (grouping) OR a continuous variable.

On this graph, over 100 countries are shown as individual circles. The X-coordinate of each circle represents that country's GDP (PPP), while the Y-coordinate represents the average life expectancy at birth. The size of each symbol is proportionate to the population of the country it represents (with the two largest symbols representing China and India, respectively). Finally, the color of each symbol represents the continent in which the country is located. This final variable (color) in this case is a categorical variable, but color in Bubble Plots can also be defined by continuous variables:

In this graph, the X-coordinate, Y-coordinate, and size of the symbols is the same as before. However, now the color of the symbol represents that country's birth rate per 1000 individuals on a continuous scale. Prism also now has semi-transparent color schemes built in so that overlapping symbols can be more clearly seen.

## Automatically add multiple comparison results to graphs!

#### Because you asked for it. Need we say more?

Simply perform an appropriate analysis with multiple pairwise comparisons. Then click once to automatically add these results to the graph. To customize these lines and asterisks, simply click the toolbar button again. Make adjustments to the data or the analysis, and the results displayed on the graph will update automatically. But remember, P values are only a part of the story. Don't forget to report effect estimates (for example, mean differences with 95% confidence intervals!) as well.

## Better Visualization of T Test Results with Estimation Plots

Perform a t test, and Prism will now automatically create an Estimation Plot of the results. On this graph, the raw data from both groups will be plotted on the left Y-axis. On the right Y-axis, the difference of the group means will be plotted along with its 95% confidence interval. This visualization provides more information than a P value alone, as it shows how wide the 95% CI is in addition to showing if the 95% CI includes zero (if the 95% CI includes zero, the P value will be greater than 0.05; if the 95% CI does NOT include zero, the P value will be less than 0.05).

## Complete release notes for Prism 9.0.0

#### New Features

- Principal Component Analysis (PCA), a method used to project data in higher dimensional space into a lower dimensional space by maximizing the variance of each new dimension. PCA is mostly used as a tool in exploratory data analysis and for making predictive models. It is often used to visualize genetic distance and relatedness between populations.
- New graph types generated by PCA
- Scree plots. Scree plots are used to visualize raw eigenvalues for each principal component (PC) identified in principal component analysis (PCA).
- Score plots. Score plots provide a means of viewing the original data in the new (reduced) dimensional space of two indicated PCs (typically PC1 as the horizontal axis and PC2 as the vertical axis).
- Loading plots. Loading plots provide a means to visualize the coefficients for two selected principal components.
- Biplots. Biplots are combinations of score plots and loading plots.
- Proportion of variance plots. This graph is similar to the Scree Plot described above, but is used with a slightly different interpretation style.

- Multiple variables graphs to graph data from the Multiple variables data table
- Make a Bubble Plot, where symbol size is encoded by a numerical or categorical variable.
- Encode symbol color and the appearance of connecting lines with other variables.
- All these choices are made on a brand new Format Graph dialog with an improved appearance.

- New semi-transparent color schemes for bubble plots.
- Estimation Plots which are a visual way to present the results of two-sample comparison tests such as the t test. The purpose of this graph which contains raw data as well as a summary of the analysis result is to emphasize the importance of effect sizes and confidence intervals while simultaneously de-emphasizing the concept of “significance”.
- Pairwise comparisons on graphs which is an automatic generation of visualizations that combine user data with results of pairwise comparisons made during hypothesis tests (i.e. automatically adding significance stars to graphs).

#### Feature Improvements

- Nonlinear regression
- Dramatically improved performance and accuracy of evaluating user-defined equations
- Define X0 for differential equations like any other parameter
- Create five residual graphs (including the new Actual vs Predicted graph). Previously, only one graph per analysis could be generated
- Re-arranged and re-labeled the options for "Unstable parameter and ambiguous fits" section on the Confidence tab of the NLR parameters dialog

- Multiple linear/logistic regression analyses
- Choose models with categorical independent variables with automatic reference level specification
- Specify method for "automatic" reference level specification based on data (first or last level, most or least frequent level)
- Specify the order of categorical variable level results via the "Define categories order" options from the Reference level tab of the MLR parameters dialog
- Improved model control (tree view) for better presentation of categorical variables and interactions
- Simplified model representation in the dialog
- Interpolate values for the outcome (dependent) variable using data in the data table or specified values for each predictor in the parameters dialog (multiple linear regression only)
- Improved Correlation matrix output so a heatmap of the results can be generated

- Multiple two sample tests analysis (Multiple t test analysis) now includes options for:
- Multiple unpaired t-tests with Welch correction
- Multiple paired t-tests
- Multiple ratio-paired t-tests
- Multiple nonparametric unpaired Mann-Whitney tests
- Multiple nonparametric paired Wilcoxon tests
- Multiple nonparametric unpaired Kolmogorov-Smirnov tests

- Row statistics analysis
- Allows for calculation of mean with custom confidence interval level
- Allows for calculation of medians with "no errors", "quartiles", "min / max", "percentiles"
- Allows for calculation of geometric means with "no errors", "geometric SD", “CI”

- Two-way ANOVA: main effects only model
- Allow for main effects only model (no interaction term) in two-way ANOVA for data with replicates
- Allow missing factors levels combination in two-way ANOVA for main effects only model
- "Simple effects" multiple comparisons not allowed for unreplicated two-way ANOVA

- Made the default title of standard transform with linked parameter be generated using analysis constant name instead of a value.
- Changed the default to the choice for unstable in 'Confidence' tab of 'Parameters: Nonlinear Regression' dialog.
- Updated the behavior of analyses that can be performed on multivariable tables to properly handle different variable types
- When creating a correlation matrix, choose to ignore rows where any value is missing or excluded. With this option selected, all correlation coefficients in the matrix computed from the same set of rows
- Allowed 'Select and Transform' analysis to define types for variables in green multiple variables results table
- (Mac) Added the alert "You are trying to analyze a single column. If you wish to perform a one-sample t test, use the One sample t and Wilcoxon test analysis. A t test requires two groups (both in Y columns; the X column is ignored)"

### Non-math feature improvements

- New data limits
- Lifted data table limits of 1024 data sets [letters A...AMJ] and 512 sub-columns.
- Lifted graph limits to be able to plot 1024 data sets.

- Multiple variable data tables
- Enabled the support of text variables in Prism's multiple variables data tables.
- Implemented the ability to define variable types in Prism's multiple variables data tables

- Special symbols and Unicode
- Added the “Character>Unicode Symbols...” command in the Insert main menu, which will open the standard system "Character Map" dialog and allow entering a bigger variety of special symbols.
- (Windows) Upgraded Prism's "Insert Character Dialog" and replace the legacy approach of using Symbols font for Greeks/Maths/European characters with Unicode compatible characters, thus, allow seamless compatibility across platforms and 3rd party apps.

- Added the Export command to the contextual menu opened in the Navigator when right-clicking on any sheet type, including graphs and layouts.
- Made Prism also delete graphs if you delete parent analysis which generated those graphs, but doesn't delete graphs when an analysis adds a curve to an existing graph of data.
- Changed the command “Choose Color Scheme…” in the main Change menu to “Color Scheme>” and made it open the submenu showing the same content except for Background, Plotting area, Axes for 'Bubble Plot' graphs.
- Added the new section with semi-transparent color schemes to 'Change -> Color Scheme' and 'Change colors' toolbar's dropdown menus.
- Implemented collection of user's name and email while performing silent activations in order to provide improved product support and education
- Updated URLs in floating notes of sample data files from the Welcome dialog to point to the Prism 9 User guide (instead of an earlier version of the guides)
- Formatting of graphs or data points on graphs from a data table and results
- (Windows) Made it possible to apply Bar, Error Bar and Line options using the Format Points contextual menu from a data table to the scatter with bars grouped graphs.
- (Windows) Made it possible to apply Error Bar and Line options using the Format Points contextual menu from a data table to scatter grouped graphs.
- (Windows) Made it possible to apply Line and Quartile formatting using the Format Points contextual menu from a data table to Violin graphs.
- (Windows) Made it possible to apply Error Bar and Line options using the Format Points contextual menu from a data sheet for superimposed Scatter graphs.
- (Windows) Made it possible to format data points on a graph from the green results tables. Case 1056514.
- (Windows) Enabled the formatting of selected symbols on a violin graph from the Format Points contextual menus on a data table.
- (Windows) Enabled the formatting contextual menu for a violin in the "Violin plot only" graph with reverse direction
- (Mac) Enabled the Line section in the Format Points contextual menus to be able to format data points on graphs with means and medians from the data table

- (Windows) Prism can now import "True" or "False" boolean values from excel as "1" and "0" into Prism's multiple variables data tables
- (Windows) Made Prism show all sheets from the family in the 'Delete Sheet(s)' dialog opened in the family navigator panel
- (Windows) Updated system requirements to Windows 7. Added the alert about urging to update Windows when Prism starts on Vista
- (Mac) Removed the "Y" label on top of each individual column in Column/Grouped/Contingency/Multiple Variable/Parts of Whole data tables
- (Mac) Made Prism remember the size of the last used window, and use that as the default size for new windows
- (Mac) Enabled the 'Reverse Category Order' command in the 'Change' and context menus for Parts of Whole graphs
- (Mac) Renamed the 'Reverse Data Set Order' command to 'Reverse Category Order' in the 'Change' and context menus for Parts of Whole graphs

#### Performance Improvements

- Improved the performance of the 'Correlation matrix' analysis more than 20x
- (Windows) Improved the performance of the Fit spline/LOWESS analysis with residual calculations by ~10x
- (Windows) Improved the performance of the 'Simple logistic regression' analysis more than 1.5x
- (Windows) Improved the performance of the 'Transform' analysis by ~4x
- (Windows) Improved the performance of the 'Prune rows' analysis by ~3x
- (Windows) Improved the performance of simulating XY data with a large number of values and several replicates more than 10x
- (Windows) Improved the performance of switching to a Grouped graph sheet when the source data contains a large number of rows more than 3x
- (Windows) Improved the performance of the 'Normalize' analysis slightly
- (Windows) Improved the performance of the 'Monte-Carlo' analysis when the source data table contains data linked from other analyses by several times
- (Mac) Improved the performance of the 'Transform' analysis by ~2.5x
- (Mac) Improved the performance of the 'Prune rows' analysis more than 10x
- (Mac) Improved the performance of switching to a Grouped graph sheet when the source data contains a large number of rows more than 2x
- (Mac) Improved the performance of the 'Normalize' analysis more than 5x

### Infrastructure

- (Windows) Prism 9 is only available for 64-bit Windows. There is no 32-bit version of Prism for Windows
- (Mac) Prism 9 requires macOS 10.12 (Sierra) or newer

#### Analysis bugs fixed

- Fixed crash on performing Extract and rearrange analysis with t-test mode for large data.
- Fixed crash on attempt to perform custom equation analysis with particular inappropriate syntax.
- Fixed the issue where the wrong column letters were displayed in the "Data sets analyzed" row of the one-way ANOVA results sheet if not all datasets were analyzed
- (Windows) Fixed the issue in which a blank results sheet was displayed for the "Plot a function" analysis after unchecking the "Plot the first derivative" option

#### Graphing bugs fixed

- Fixed the issue when a pattern fill applied to all data sets appeared missing from some bars on a bar graph
- Fixed the issue when color settings were not applied to the graph if ‘All’ item was selected in the ‘Data Sets’ section of the ‘Define Color Scheme’ dialog
- Fixed the issue when some data points were outside the axis limits on the Grouped scatter plot graph if the source data table contained a large number of rows.
- Fixed the issue when the ‘Median’ was unexpectedly changed to ‘Mean’ in the ‘Style’ section of the ‘Format Graph’ dialog after selecting the ‘Scatter dot plot’ or the ‘Aligned dot plot’ appearance.
- Fixed the issue when the excluded property of X values appeared lost after cloning graph and changing data table format to show error values.
- Fixed the issue when it was impossible to format data points on XY graphs using the Format Points contextual menu on a data table after executing the "Apply Data Set Format" contextual command.
- (Windows) Fixed the issue that when you try to format a single violin on a graph, it unexpectedly changes all violins of the entire data set.
- (Windows) Fixed the issue when row titles were corrupted on a grouped graph after adding new data and clicking on the X axis if bars could not be displayed.
- (Windows) Fixed the issue when a graph disappeared from the sheet after performing the 'Group' command for two drawing brackets with asterisks.
- (Windows) Fixed the issue when shading between custom axis lines was lost after another shading was added to the graph
- (Windows) Fixed the issue in which points on the resulting graph did not retain the color formatting when the graph family had been duplicated
- (Mac) Fixed the issue in which bars on an interleaved graph overlapped if the value fo the second bar was set to zero

### Other bug fixes

- Fixed the issue when 'Set as Default' option in the 'Format Object' dialog did not work for text objects.
- Fixed the issue when 'Plot' drop-down and 'Show truncated violin plot' checkbox were unexpectedly shown in the Change Graph Type dialog for grouped or nested violins if one data set has been plotted with showing points.
- Use Šídák instead of Sidak in Parameters analysis dialogs and results.
- (Windows) Fixed that crash that occurred after invoking Format Axes for graph after canceling Magic Step 2.
- (Windows) Fixed the crash when changing the graph's appearance in the Format Graph dialog globally for odd data sets.
- (Windows) Fixed the crash on attempt to rename multiple sheets in the gallery view of a Navigator's Family Pane which were deleted and restored using for analyses with residual graphs
- (Windows) Fixed the crash on changing Scatter to Violin plot graph if several data sets in a data table were replaced by a single data set beforehand.
- Fixed the crash on making a number of subcolumns huge (over 178 for Prism 8, over 238 for Prism 9) for a data set containing a huge number of values in a data table (~500000).
- (Windows) Fixed the issue when all subcolumns became filled with values on making the number of subcolumns large if any data set contained a huge number of values.
- (Windows) Fixed the issue when the unexpected alert appeared on changing replicates number for data table with a huge number of subcolumns and rows.
- (Windows) Fixed the issue when warning did not appear on inserting data that exceeded the maximum number of supported columns.
- Fixed the issue when values were not added and unexpected alert appeared on an attempt to add a huge number of values into the data table with a lot of subcolumns.
- (Windows) Fixed the issue when only the first row of the multi-line row title appeared visible after deleting another row in the data table.
- (Windows) Fixed the issue when it was impossible to format data points from results table.
- (Windows) Fixed the issue when controls were not aligned in the upper left of the Welcome dialog when stretched it.
- (Windows) Fixed the issue when unexpected characters were shown next to the data-set title containing numbers in the 'Hint' section of the 'Replace Data Set' dialog.
- (Windows) Fixed the issue when the 'Reverse Legends Order' command was unexpectedly available when there were one or no legends on a graph.
- (Windows) Fixed the issue when the error bars section in Format Points contextual menu on a data table was not dimmed when there were no error bars on a graph.
- (Windows) Fixed the issue when a part of the data table was not visible after navigating to the bottom of a data set via then End key with a huge number of values.
- (Windows) Fixed the issue in which Prism would crash after selecting undo for actions done in Info Notes and Info Table
- (Mac) Fixed the issue in which Prism would freeze on an attempt to switch to a simulated XY data sheet with 256 data sets and 256 replicates
- (Mac) Fixed the issue in which it was impossible to create or open a PRism file with a large amount of data (256 data sets, 256 replicates, 500 rows)
- (Mac) Fixed the broken "Also draw lines" drop-down for line thinkcness in the "Gaps" tab of the "Format Graph" dialog for heatmaps
- (Mac) Fixed the issue in which the yellow color was missing from the "Highlight selected cell" icon on the toolbar