Tuesday, 2 February 2016

Data Analysis – T-Test vs Mann Whitney U

For those unfamiliar with statistics, it can often be confusing when deciding which test to apply to analyse data in order to determine whether changes observed are indeed statistically significant (i.e. p<0.05)

The following will provide some guidance on how to analyse data between two groups (e.g. placebo vs drug treatment, normal vs diseased, light vs dark, etc).

What Are The Differences?
The t-test is a test between population means. They are parametric tests and should only be applied to data that is normally distributed. In contrast, the Mann-Whitney U (MWU) test is a test of differences in medians as well as the shape and spread of the data. It is a non-parametric test that can be used as an alternative to the t-test on data that is not normally distributed. However, it should be noted that the MWU test can be applied to normally distributed data.

The number of data points or sample size can also affect the choice in tests. If you have large datasets that have a normal distribution, a t-test can be very powerful. But if you have a small number of data points (e.g. less than 6 data points), a MWU test would be preferable to a t-test since the data is unlikely to have a normal distribution.

What Do I Use?  
To determine which test to apply, you will first need to establish whether your experimental data is normally distributed. For illustrative purposes, a mock dataset (below) will be used. The dataset below is from two groups (normal vs diseased). The sample type is coded and blind to the experimenter.



There are a number of normality tests for you to choose from, but if you have a set of data with less than 2000 data points, as above, try using the Shapiro-Wilk normality test. The null hypothesis is that your data points belong to a normal distribution; reciprocally, your alternative hypothesis is that your data points do not belong to a normal distribution. If p<0.05, your data does not have a normal distribution. For the above dataset, the results of running the Shapiro-Wilk test is as follows:


n = 24
Mean = 66.33333333333333
SD = 21.25142791451429
W = 0.9437512788610762

Threshold (p=0.01) = 0.8840000033378601
Threshold (p=0.05) = 0.9160000085830688
Threshold (p=0.10) = 0.9300000071525574


Here, p>0.05 at all thresholds. Accordingly, the alternative hypothesis is rejected and we conclude that the data points have a normal distribution.

** The results above were calculated using an online Shapiro-Wilk calculator.   

For sample sizes greater than 2000, you can use the Kolmogorov-Smirnov test. If you prefer to visualize the data in graphical form, try using a normal probability plot or quantile-quantile plot.

Other Ways To Test Normality
Aside from the normality tests mentioned above, there are other normality tests available. For their description, please refer to the following link.  

No comments:

Post a Comment