Introduction to R Programming

Two samples with Numeric Variable

When comparing the differences between two samples for the same subject, the hypotheses are generated as

  or 

  • The parametric test is testing a hypothesis about the population mean, while the non-parametric test is testing a hypothesis about the population median.
  • When you have a really small data, the parametric test should be used for normally distributed data, whereas the non-parametric test should be used for non-normal distributed data. When you have a non-normal dataset but if you satisfy the following sample size guidelines, then the parametric tests can perform well. 
1-sample T-test > 20
2-sample T-test Each group should > 20
One-Way ANOVA

For 2-9 groups, each group should > 15

For 10-12 groups, each group should > 20

  • If these two samples are independent, we could consider using Two-sample T-Test (parametric test) or Wilcoxon-Mann-Whitney test (non-parametric test).
  • If these two samples are related or matched couplings occur, e.g. pre and post samples with an intervention, cross over trials, matched samples, or duplicated measurements on the samples, we could consider using Paired T-Test (parametric test) or Wilcoxon Signed-Rank Test (non-parametric test).

Two-sample T-Test

t.test(mpg~am, data = mtcars)

The left-hand side in the formula section is the continuous response variable and the right-hand side is the two-sample population which observations belong.

 

Testing for Equality of Variance

This is a test to see whether it is reasonable to use the var.equal = TRUE option for t-test statement.

Note: In this case, p-value is greater than 0.05, which means it is not appropriate to put var.equal = TRUE option into the var.test.

Paired T-Test

t.test(extra ~ group, data = sleep, paired = TRUE)

For this example, it tests the effect of a particular drug on participants' sleep. Based on the p-value, we conclude that this drug does have an effect on sleep (p-value = 0.002833 < 0.05), suggesting this drug increases the sleep by the mean of 1.58 hours. 

Wilcoxon Signed Rank Test

wilcox.test(immer$Y1, immer$Y2, paired=TRUE)

Note: immer is the dataset, Y1 and Y2 are comparison groups.

Paired T-Test vs. Wilcoxon Signed Rank Test

Paired T-Test is a parametric test, while Wilcoxon-Signed Rank Test is a non-parametric test.

Paired T-Test can be used only when the differences calculated by each pair of values is normally distributed. By checking the normality, we usually plot the histogram, Q-Q plot or use the Shapiro-Wilk's test. If the normality assumption is violated, we should consider using Wilcoxon-Signed Rank Test.

Note: The sampling distribution tends to be normal if the sample is large enough (n > 30) according to the Central Limit Theorem.

Wilcoxon-Mann-Whitney Test

wilcox.test(mpg ~ am, data=mtcars)

Note: mtcars is the dataset, am includes two comparison groups which are automatic and manual transmissions. 

Marriott Library Eccles Library Quinney Law Library