Simulating data with random error

Print this Topic

This page explains how to simulate data sets, including random error. Look elsewhere to plot a family of theoretical curves.

Key concepts: Simulating data

Simulation is an underused tool. It is a great way to understand models and plan experiments. Prism lets you combine an analysis that simulates data with a script to do so many times as a way to perform Monte Carlo analyses.

Note that this analysis doesn't in fact analyze any data. Instead it generates curves from an equation.

How to: Simulate XY data

1.To simulate a family of data sets with random error, start from any data table or graph, click Analyze, open the Simulate and generate category, and then select Simulate data with random scatter.
2.X values tab. Generate a regular series (arithmetic or geometric) of X values or use the X values from the data table you are analyzing.

3.Equation tab. You can choose to use Y values from the data table you are analyzing, and then add random scatter. More often, you will choose an equation on this tab.

4.Parameter values tab. On top of the tab, choose how many data sets you wish to simulate. At the bottom of the tab, choose how many replicates each data set will have. The main part of the tab is where you enter the values of each parameter. If you choose to simulate more than one data set, then you can choose to enter a parameter value just for one data set, or to enter a parameter that applies to several, or all, curves. Choose the data sets on the top part of the dialog, and enter the parameter values for that data set (or that group of data sets) below.

5.Random error tab. Choose among several methods for generating random scatter and also adding outliers.

How to: Simulate column data

Prism can only simulate XY data. But you can simulate column data by following these steps.

1.In the first tab, choose a range of X values that generates the number of rows of data you want. The X values will be ignored, but you have to specify a range anyway.
2.In the second tab, choose the equation for a straight line from the lines section. 
3.in the third tab, choose the number of data sets (columns) you want, and set the number of replicates (bottom of tab) to 1. Click "Select all" and set the slope equal to 0.0. Then set the intercept equal to the mean you want for each data set.
4.In the fourth tab, choose the random scatter which will be added to the mean values you entered (as 'intercept')
5.View the graph. It will be an XY graph, which is not useful. Click the Graph Type button and change to a column scatter graph.

How Prism generates random numbers

Prism can add random values to each of the calculated Y values to simulate experimental error.

The only way to generate truly random numbers is through a random physical process, such as tossing dice or measuring intervals between radioactive decays. Prism, like all computer programs, generates “random” numbers from defined calculations. Since the sequence of numbers is reproducible, mathematicians say that the numbers are “pseudo-random”. The difference between truly random and pseudo-random numbers rarely creates a problem. For most purposes, computer-generated random numbers are random enough to simulate data and test analytical methods.

Prism uses the time of day when calculating the first random number, so you will get a different series of random numbers every time you run the program.

Prism generates random values from a Gaussian distribution using routines adapted from ideas presented in Numerical Recipes in C, (W. H. Press et al, second edition, Cambridge Press, 1992). The function RAN3 (defined in Numerical Recipes) generates uniformly distributed random numbers and the function GASDEV transforms them to a Gaussian distribution with a mean of zero and a standard deviation you enter.

If you choose relative error, Prism first calculates a random number from a Gaussian distribution with a mean of zero and with a SD equal to the percent error you enter. It then multiplies that percentage times the ideal Y value to yield the actual random value that is added to the Y value.

When the Y values represent the number of objects you would observe in a certain space, or the number of events you would observe in a certain time interval, choose random numbers from a Poisson distribution. Again, our method is based on ideas from Numerical Recipes.

Prism also can generate random numbers from a t distribution with any number of degrees of freedom (df). This lets you simulate wider scatter than Gaussian. If df is low, this distribution is very wide. If df is high (more than 20 or so), it is almost indistinguishable from a Gaussian distribution. If df=1, the distribution is extremely wide (lots of outliers) and is identical to a Lorentzian distribution, also known as the Cauchy distribution. Prism uses this equation to generate random numbers from the t distribution with df degrees of freedom:

In this equation, Rand is a random number drawn from a Gaussian distribution with mean=0 and SD=1. To compute a random number from a t distribution with df degrees of freedom, Prism generates df+1 different random numbers drawn from a Gaussian distribution.



Copyright (c) 2007 GraphPad Software Inc. All rights reserved.
URL: http://www.graphpad.com/help/Prism5/Prism5Help.html?stat_simulating_data_with_random_error.htm