|
||||||||||||
July 1, 2009The distinction between confidence intervals, prediction intervals and tolerance intervals. When you fit a parameter to a model, the accuracy or precision can be expressed as a confidence interval, a prediction interval or a tolerance interval. The three are quite distinct. The discussion below explains the three different intervals for the simple case of fitting a mean to a sample of data (assuming sampling from a Gaussian distribution). The same ideas can be applied to intervals for any best-fit parameter determined by regression. Confidence intervals tell you about how well you have determined the mean. Assume that the data really are randomly sampled from a Gaussian distribution. If you do this many times, and calculate a confidence interval of the mean from each sample, you'd expect about 95 % of those intervals to include the true value of the population mean. The key point is that the confidence interval tells you about the likely location of the true population parameter. Prediction intervals tell you where you can expect to see the next data point sampled. Assume that the data really are randomly sampled from a Gaussian distribution. Collect a sample of data and calculate a prediction interval. Then sample one more value from the population. If you do this many times, you'd expect that next value to lie within that prediction interval in 95% of the samples.The key point is that the prediction interval tells you about the distribution of values, not the uncertainty in determining the population mean. Prediction intervals must account for both the uncertainty in knowing the value of the population mean, plus data scatter. So a prediction interval is always wider than a confidence interval. Before moving on to tolerance intervals, let's define that word 'expect' used in defining a prediction interval. It means there is a 50% chance that you'd see the value within the interval in more than 95% of the samples, and a 50% chance that you'd see the value within the interval in less than 50% of the samples. What if you want to be 95% sure that the interval contains 95% of the values? Or 90% sure that the interval contains 99% of the values? Those latter questions are answered by a tolerance interval. To compute, or understand, a tolerance interval you have to specify two different percentages. One expresses how sure you want to be, and the other expresses what fraction of the values the interval will contain. If you set the first value (how sure) to 50%, then a tolerance interval is the same as a prediction interval. If you set it to a higher value (say 90% or 99%) then the tolerance interval is wider.
June 26, 2009The use and abuse of logarithmic axes Logarithmic axes are widely used by students and scientists, and are a frequent source of confusion and frustration. I wrote this 13 page article (table of contents below) to explain the uses and abuses of logarithmic axes. The article is written for users of GraphPad Prism, but almost all the information will be useful no matter how you make your graphs. What is a logarithmic axis?
Logarithmic axes cannot contain zero or negative numbers
When to use a logarithmic axis
Lognormal distributions
Distinguish using a logarithmic axis from plotting logarithms
Copy and paste tabular results from InStat to Excel. Aligned! InStat presents the results in text tables, using multiple spaces to align columns. It looks great within InStat, and prints fine. But if you copy the results and paste into Excel, it becomes a mess. The problem is that Excel only sees a paragraph, not a table, so it pastes everything into one column. But it is easy to fix things so the results are tabular within Excel:
I've tested this with Excel 2007 (Win) and 2008 (Mac). It ought to work with earlier versions as well. I was not able to find a similar command in Numbers (part of iWork for Mac).
June 19, 2009We are testing InStat 3.1
Our programmer have completed the work on InStat 3.1. The biggest change is that we've increased the maximum number of rows by a factor of ten, and doubled the maximum number of columns. Full list of changes. Contact me if you'd like to beta test InStat 3.1.
June 17, 2009The essential concepts of statistics If you know twelve concepts about a given topic you will look like an expert to people who only know two or three. Scott Adams, creator of Dilbert When learning statistics, it is easy to get bogged down in the details, and lose track of the big picture. Here are the twelve most important concepts in statistical inference.
The whole point of inferential statistics is to extrapolate from limited data to make a general conclusion. "Descriptive statistics" simply describes data without reaching any general conclusions. But the challenging and difficult aspects of statistics are all about reaching general conclusions from limited data. The word ‘intuitive’ has two meanings. One meaning is “easy to use and understand.” That was my goal when I wrote Intuitive Biostatistics .The other meaning of 'intuitive' is “instinctive, or acting on what one feels to be true even without reason.” Using this definition, statistical reasoning is far from intuitive. When thinking about data, intuition often leads us astray. People frequently see patterns in random data and often jump to unwarranted conclusions. Statistical rigor is needed to make valid conclusions from data. "Statistics means never having to say you are certain." If a statistical conclusion ever seems certain, you probably are misunderstanding something. The whole point of statistics is to quantify uncertainty. Every statistical inference is based on a list of assumptions. Don't try to interpret any statistical results until after you have reviewed that list. An assumption behind every statistical calculation is that the data were randomly sampled, or at least representative of, a larger population of values that could have been collected. If your data are not representative of a larger set of data you could have collected (but didn't), then statistical inference makes no sense. Analyzing data requires many decisions. Parametric or nonparametric test? Eliminate outliers or not? Transform the data first? Normalize to external control values? Adjust for covariates? Use weighting factors in regression? All these decisions (and more) should be part of experimental design. When decisions about statistical analysis are made after inspecting the data, it is too easy for statistical analysis to become a high-tech Ouja board -- a method to produce preordained results, rather an objective method of analyzing data. Say you've computed the mean of a set of values you've collected,or the proportion of subjects where some event happened. Those values describe the sample you've analyzed. But what about the overall population you sampled from? The true population mean (or proportion) might be higher, or it might be lower. The calculation of a 95% confidence interval takes into account sample size and scatter. Given a set of assumptions, you can be 95% sure that the confidence interval includes the true population value (which you could only know for sure by collecting an infinite amount of data). Of course, there is nothing special about 95% except tradition. Confidence itnervals can be computed for any degree of desired confidence. Amost all results -- proportions, relative risks, odds ratios, means, differences between means, slopes, rate constants... -- should be accompanied with a confidence interval. The logic of a P value seems strange at first. When testing whether two groups differ (different mean, different proportion, etc.), first hypothesize that the two populations are, in fact, identical. This is called the null hypothesis. Then ask: If thenull hypothesis were true, how unlikely would it be to randomly obtain samples where the difference is as large (or even larger) than actually observed? If the P value is large, your data are consistent with the null hypothesis. If the P value is small, there is only a small chance that random chance would have created as large a difference as actually observed. This makes you question whether the null hypothesis is true. If the P value is less than 0.05 (an arbitrary, but well accepted threshold), the results are deemed to be statistically significant. That phrase sounds so definitive. But all it means is that, by chance alone, the difference (or association or correlation..) you observed (or one even larger) would happen less than 5% of the time. That's it. A tiny effect that is scientifically or clinically trivial can be statistically significant (especially with large samples). That conclusion can also be wrong, as you'll reach a conclusion that results are statistically significant 5% of the time just by chance. If a difference is not statistically significant, you can conclude that the observed results are not inconsistent with the null hypothesis. Note the double negative. You cannot conclude that the null hypothesis is true. It is quite possible that the null hypothesis is false, and that there really is a difference between the populations. This is especially a problem with small sample sizes. It makes sense to define a result as being statistically significant or not statistically significant when you need to make a decision based on this one result. Othewise, the concept of statistical significance adds little to data analysis. When many hypotheses are tested at once, the problem of multiple comparisons makes it very easy to be fooled. If 5% of tests will be "statistically significant" by chance, you expect lots of statistically significant results if you test many hypotheses. Special methods can be used to reduce the problem of finding false, but statistically significant, results, but these methods also make it harder to find true effects. Multiple comparisons can be insidious. It is only possible to correclty interpret statistical analyses when all analyses are planned, and all planned analyses are conducted and reported. However, these simple rules are widely broken. A statistically significant correlation or association between two variables may indicate that one variable causes the other. But it may just mean that both are influenced by a third variable. Or it may be a coincidence. By the time you read a paper, a great deal of selection has occurred. When experiments are successful, scientists continue the project. Lots of other projects get abandoned.When the project is done, scientists are more likely to write up projects that lead to remarkable results, or to keep analyzing the data in various ways to extract a "statistically significant" conclusion. Finally, journals are more likely to publish “positive” studies. If the null hypothesis were true, you would expect a statistically significant result in 5% of experiments. But those 5% are more likely to get published than the other 95%.
June 11, 2009Confidence intervals vs. confidence bands for survival curves When Prism computes survival curves, it can also compute the 95% confidence interval at each time point (using two alternative methods). The methods are approximations, but can be interpreted like any confidence interval. You know the observed survival percentage at a certain time in your study, and can be 95% confident (given a set of assumptions) that the confidence interval contains the true population value (which you could only know for sure if you had an infinite amount of data). When these confidence intervals are plotted as error bars (left graph below) there is no problem. Prism can also connect the ends of the error bars, and create a shaded region (right graph below). This survival curve plots the survival of a sample of only seven people, so the confidence intervals are very wide. Prism file.
The shaded region looks like the confidence bands computed by linear and nonlinear regression, so it is tempting to interpret these regions as confidence bands. But it is not correct to say that you can be 95% certain that these bands contain the entire survival curve. It is only correct to say that at any time point, there is a 95% chance that the interval contains the true percentage survival. The true survival curve (which you can't know) may be within the confidence intervals at some time points and outside the confidence intervals at other time points. It is possible (but not with Prism) to compute true confidence bands for survival curves, and these are wider than the confidence intervals shown above. The graph below (from Coviello) shows the survival curve in black (the sample was large, so the steps are small), the confidence limits in green, and the confidence bands in red. Confidence bands that are 95% certain to contain the entire survival curve (red) are wider than the confidence intervals for individual time points.
(Thanks to Joe Felsenstein for pointing out the distinction between confidence intervals and confidence bands in survival curves. )
June 8, 2009How does Prism handle missing values? Prism handles missing values easily. When entering data, simply leave a blank spot for any value that is missing. Prism never treats an empty cell as if you had entered zero -- it always knows that is a missing value. The details of how Prism handles missing values differs for various statistical tests. Unpaired t or or the Mann-Whitney nonparametric test These tests work fine with unequal sample size. Missing values are not a problem. Paired t or Wilcoxon matched pairs test Prism only analyzes rows where there are data for both conditions. If one value is missing, that subject (row) is ignored. Ordinary two-way repeated measures ANOVA -- Enter raw data If some values are missing, two-way ANOVA calculations are challenging. Prism uses the method detailed in SA Glantz and BK Slinker (details below). This method converts the ANOVA problem to a multiple regression problem and then displays the results as ANOVA. Prism performs multiple regression three times — each time presenting columns, rows, and interaction to the multiple regression procedure in a different order. Although it calculates each sum-of-squares three times, Prism only displays the sum-of-squares for the factor entered last into the multiple regression equation. These are called Type III sum-of-squares. Ordinary two-way repeated measures ANOVA -- Enter mean, SD (or SEM) and N If your data are balanced (same sample size for each condition), you'll get the same results if you enter raw data, or if you enter mean, SD (or SEM), and N. If your data are unbalanced, it is impossible to calculate precise results from data entered as mean, SD (or SEM), and N. Instead, Prism uses a simpler method called analysis of “unweighted means”. This method is detailed in LD Fisher and G vanBelle (details below). If sample size is the same in all groups, and in some other special cases, this simpler method gives exactly the same results as obtained by analysis of the raw data. In other cases, however, the results will only be approximately correct. If your data are almost balanced (just one or a few missing values), the approximation is a good one. When data are unbalanced, you should enter individual replicates whenever possible. Repeated measures two-way ANOVA Prism cannot perform repeated-measures two-way ANOVA if any values are missing. It is OK to have different numbers of numbers of subjects in each group, so long as you have complete data (at each time point or dose) for each subject. Say you are comparing two groups (control and treated) measured at four time points. It would be fine if there were more treated subjects than control subjects, so long as each subject has data at all four time points. But Prism can not analyze repeated measures two-way ANOVA if one of the subjects only had data for three time points, with the fourth time point missing. Linear and nonlinear regression Fitting lines and curves works fine with missing values. You can choose whether Prism fits the individual replicates or fits the means. If you choose to fit the means, each mean gets the same weight regardless of how many values were used to compute it. If you fit the individual replicates, then X values with more Y replicates get more weight than X values with fewer replicates. Survival curves Comparison of survival curves does not require equal sample size. If data are completely missing for any subject, simply don't enter data for that subject. But before deciding to leave data out, read about censoring which happens when you know the subject survived up until a certain point, but don't know what happened after that (or you know, but can't use the data because the experimental protocol wasn't followed). Prism handles censored data fine. Don't omit those subjects, enter the duration that they survived on the experimental protocol and mark that duration as censored.
May 29, 2009How to make two frequency distributions have the same X axis limits Prism's frequency distribution analysis takes a stack of values and creates a frequency distribution table. You can choose the bin width, or Prism can choose it for you. If you create a series of frequency distributions from different data tables, you'll want the range to be the same for all. However, Prism stops making bins when it gets to the largest values. If different data sets have different ranges of data, the distributions won't be consistent and graphs of those distributions (histograms) won't align. With Prism 5, this is easy to fix. On the Frequency Distribution dialog, choose a large value for the last bin -- large enough to include all the values in all your data sets. Choose the same value for all the analyses that you want to plot together.
With Prism 4, it is not possible to set the center of the last bin. Prism stops when it runs out of values. The only way to get consistent graphs would be to copy the results and paste onto an empty data table. Then you can add additional bins (X values), each with zero for the Y values.
May 27, 2009How to tell Prism which display to start on. Windows Prism Windows fills one window (with other windows inside of that one, if you open several projects). You can move that Windows around, and move it to a different display. If the Prism window is on the primary display, then Prism remembers its size and location. Next time you start Prism, it will appear in exactly the same location. If you move Prism to a different display, it gets confused. Next time you start Prism, it will start on the primary display. But you can trick Prism to get it to start on a different display. The first thing to do is click the Restore button near the upper right corner, so the Prism window does not entirely fill the display. Now grab its corner and stretch it, so a tiny part (just one pixel is enough) is on the primary display, while most of the window remains on the secondary display. Now when you quit Prism, it will remember its location and start up next time in exactly the same location with the same size. The trick is that part of the Prism window needs to be on the primary display -- but it can be a very tiny, almost invisible, part. Mac Each Prism document is in its own Window, which you can move anywhere. You can have one project on one display and another project on another. When Prism starts up, the first project is always placed on the primary display. You cannot tell Prism to always start on a different display.
May 25, 2009How the replicates test works
As part of nonlinear regression, Prism 5 can compute the replicates test. It asks if the deviation of the points from the curve is 'too far' compared to the scatter among the replicates. Prism does the calculations correctly, but we just learned that the explanation in the Prism help is incorrect. Details here.
April 10, 2009Choosing colors for graphs
This article, by Steven Few, does a great job of explaining when to use colors when making graphs, and how to choose which colors to use.
|
||||||||||||