Statistics Cover

Why your p-value means nothing

Researchers painstakingly search for that p value under 0.05, which indicates that data sets are significantly different from each other. It’s a quest that, while getting a special star in your bar graph, is often misunderstood. Choosing the appropriate test, conducting power analysis and determining if your data are normally distributed are steps needed to ensure that you can actually reject the null hypothesis if p < 0.05. However, more often than not, students input data into their statistics program and go straight to the p value. If it’s low enough, the experiment is over and they did a good job. This isn’t sufficient to generate sound inferences from data, and more journals should focus on demanding for details on prospective studies statistical methods.

Determining variables

Before the start of an experiment, how many variables there are and whether or not they are independent or dependent must be determined. Independent variables refer to those that are consciously modified to generate an effect, and the dependent variables are the ones to be measured (i.e. if we want to see the effects of yearly income on number of TVs owned, yearly income would be the independent variable and number of TVs would be dependent). Put another way, the dependent variables change based on the independent variables.

Once the number and type of variables have been determined, the next step is finding out if the data from each independent variable are normally distributed. This is important because parametric statistical tests (t-tests, analysis of variance, regression analysis, etc.) assume that the data fits a predictable normal or Gaussian bell-curve. If the data is not normal, there are equivalent non-parametric statistical tests (mann whitney u test, kruskal wallis test, etc.) that should be used in place of parametric ones.

Tests for normality

Data plotted in a histogram are normally distributed in Figure 1. Properties of this curve are that it is symmetric around the mean, which also happens to be the mode and median. Normal distributions also adhere to the empirical rule, which states that 99.7% of values lie within 3 standard deviations of the mean. The rule also states that 2 standard deviations from the mean includes 95% of the values, and within 1 standard deviation includes 68% of them. To see if a data set fits a normal distribution, there are many tests that can be done. The more common ones include plotting a graph like in the given Figure 1 and taking note of the skewness and kurtosis. Skewness refers to a slant in the graph, where most of the data happens to be on one side more than the other. Kurtosis is the sharpness of the peak. The closer these measures are to zero, the more normal the data. One way to see if they are too big, is to divide the skew and kurtotic numbers by their respective standard error, which are the skewness and kurtosis z-values. Both should be between 1.96 and -1.96 for a normal distribution.

Nomal Dist.
Figure 1 Normal distribution of an imaginary data set..

Other than visual inspection of the data, a common test is the shapiro-wilk test that can be computed by most statistics software. This function tests the null hypothesis that a data set came from a population that fits a normal distribution.

If the p-value is below the set α, usually between 0.05-0.001, then the null hypothesis is rejected and the data do fit a normal distribution. Finally, a probability plot that may be used to interpret normality is the Q-Q plot. This plot compares distributions, and in the case of a single data set, compare it to the line y=x. The closer the points of the data lie to y=x, the more normal they are. What makes the Q-Q plot unique, however, is that it plots quantiles of the data using a continuous cumulative distribution function. Learning how to do this would be tedious, but statistics software packages can create it. An example of a Q-Q plot graciously donated by wikipedia that shows a normal distribution is in Figure 2.

Figure 2 Normal dist
Figure 2 Normal distribution of an imaginary data set demonstrated on a Q-Q plot.

Power analysis The law of large numbers states that if we keep taking more and more samples, the mean of the sample gets closer and closer to the mean of the population. And really, our goal is to extrapolate inferences about populations with a reasonable number of samples. But how many samples (N value) are enough to make this inference? For many experiments in too many labs, this number is the lowest number possible that SPSS will allow statistics to be run and statistical significance to be achieved. In reality, this low N value leads to very low power, suggesting that the significance of the test is skewed, because the test is setup to detect too many false positives. Power analysis is a required tool that determines the percent probability of your parameters detecting an effect, if it exists. In terms of hypothesis testing, power refers to the sensitivity, or the probability of rejecting the null hypothesis when it is false. This is related to the two types of error researchers make in statistical analysis called Type I and Type II error. If the null hypothesis is rejected when it shouldn’t have, Type I error was committed, known as α. Type II error is committed when the null hypothesis is accepted in a case where it should have been rejected, known as β. According to our definition of power, it is the probability of rejecting the null hypothesis correctly, or 1-β. The lower the power, the great the chance of committing Type II error. Conducting power analysis is extremely useful because it allows calculation of the minimal sample size required to detect an effect. In fact, three factors come into play when determining power. They are size of the effect in the population, number of samples and α. Power values range from 0 to 1, where the most accepted level is 0.8, but also depends based on your field (search pubmed for reviews on the topic in your specific area). The actual calculation of power can be done in many statistics software packages, and each of the factors mentioned above can modify the other.

How do I calculate these factors for power analysis?

  • α – As mentioned above, α is known as Type I error or a false positive error. Scientists have generally accepted 0.05 as ideal for α, but again, review the literature in your field and see if they use a lower α like 0.01 or 0.001.
  • Effect size – An entire post should be dedicated to effect size because it allows researchers to tell if any observed differences are meaningful regardless of their significance. Data sets can be significantly difference from each other due to a high N number but their effect size may be so low because the real difference is so small. The equation for calculating effect size (ES) is:
ES = Mean 1 - Mean 2/ chi pooled
Effect Size Calculation

refers to the mean of each group and σ is the standard deviation of both data sets. To get pooled standard deviation use the following formula:

pooled std dev equals square root of std dev 1 sqaured + std dev 2 squared  all divided by 2
Pooled Std. Dev Equation

The larger the effect size, the more meaningful your results. For an ideal effect, an effect size of 0.5 or greater is standard, but your field might be different.

  • N value – This is likely to be our unknown in our calculation of power, because a power of 0.8 is ideal. A crux to power analysis is that it should be conducted before the analyses, which is obviously difficult because effect size needs means of the data. Therefore, calculated guesses are the best hope to get a feel for how many samples required in the study. Since there are generally accepted values for all the criteria, they can be used to determine the minimum number of samples to get a significant, meaningful and powerful result.

Which test to use?

The final part of conducting a solid statistical analysis (and power analysis) is choosing the right test for your data. Consult a handy chart, made by smart kids at UCLA to see which test should be run, given the number and types of variables and whether the data is normal or not.

Happy experimenting!

For more reading:

Dr. Kang

Dr. Kang Interview Pt 1

Becoming a successful scientist is a difficult task. With a variety of interesting fields, and funding being tightened every year, it’s clear that students need to choose their path carefully. AiMED sat down with Dr. Chil-Yong Kang, a professor at Western University, but more well-known for his significant role in the development of a vaccine against the human immunodeficiency virus (HIV) and asked him about his story.

Dr. Kang conducted his PhD studies at McMaster University where he studied the structure of the vesicular stomatitis virus. As he was finishing up his PhD in 1971 with four significant publications in journals like Nature, Virology and Journal of Virology, there was lots of buzz surrounding the recent discovery of the enzyme, reverse transcriptase, found in retroviruses. Dr. Howard Temin, the Nobel laureate for his discovery of the reverse transcriptase in 1970, rattled the research community, because this enzyme changed the current dogma that states that information flow in a cell is unidirectional. Previously, it was thought that DNA is the code that, through transcription, leads to RNA, which will translate to proteins that are essential to life. The function of reverse transcriptase is to take RNA and reverse transcribe to generate DNA, a process essential in retrovirus reproduction. Dr. Kang wrote to Dr. Temin asking for a postdoctoral fellowship in his lab and was lucky to be chosen out of 52 applications. For all PhD students looking for postdoctoral position, find a supervisor that is a leader in their field, and has made significant contributions to the scientific community.

It was under the supervision of Dr. Temin, at the University of Wisconsin-Madison, that Dr. Kang’s interest in reverse transcriptase and retroviruses (in which HIV is classified) was cultivated. Dr. Kang discovered that reverse transcriptase was actually found in normal embryonic chick cells as well as in retroviruses. This was puzzling because it was originally thought to be restricted to viruses. He was able to publish numerous manuscripts on the topic in well-respected academic journals such as: Nature and Proceedings of the National Academy of Sciences. The impact of these publications made it easy for him to secure an Assistant Professorship at the University of Texas Southwestern Medical School. Impact factor is limited as a metric for quality publications, but departmental recruitment committees still prefer candidates with publications in high-impact journals like Nature, Science or Cell. As a postdoctoral fellow, ensuring your research is published in these types of journals is essential to be a competitive candidate for an assistant professor.

Dr. Kang was promoted to a rank of full Professor in the Department of Microbiology at the UT Southwestern Medical School in 8 years. Dr. Kang was interested in studying hantavirus, an RNA virus that causes fatal pulmonary disease, and has moved to the University of Ottawa for their facilities as the Professor and Chairman of the Department of Microbiology and Immunology in 1982 and while there, his group genetically characterized all the Hantaviruses. This included variants that actually caused no disease. After comparing genetic sequence between the different hantaviruses, Dr. Kang’s group published and deposited all the genetic sequence data at the EMBL (European Molecular Biology Laboratory). The US Center for Disease Control in Atlanta diagnosed the mysterious pulmonary disease outbreak among Navaho Indians in the Four-corners states in 1993 caused by hantavirus on the bases of Dr. Kang’s genetic sequence data. Their lab quickly became a major player in viral research, and they gained notoriety as experts in RNA viruses such as VSV, human parainfluenza virus, hantavirus, reticuloendotheliosis virus and HIV.

His group began by studying the basic science of HIV. They looked at different enzymes present in HIV, and investigated intracellular processing of HIV surface glycoprotein with particular interest in the surface glycoprotein signal peptide of HIV. This essential peptide is 30 amino acids long, and contains an unusually high number of positively charged amino acids. Altering some of these positively charged amino acids led to highly efficient virus production with increases in glycoprotein processing. Dr. Kang and his group published numerous manuscripts on the topic and he was then offered the Dean of Science position at Western University in 1992 and started working on a vaccine for HIV.

In this way, it is important for students to constantly have projects on the go, so as to increase the likelihood of publishing. As a PhD or Masters students, publishing as many peer-reviewed publications as possible is important so potential postdoc supervisors can see your productivity and your ability to organize ideas into coherent scientific narratives. During a postdoctoral fellowship, productivity is important, but it is also critical that manuscripts are published in well-respected journals. The combination of this will make you an attractive candidate for any academic or industrial position.

More to come soon!

Official Blog of AiMED