Proportion Comparison: T or Chi?

If we try to compare two proportions from two independent groups, which test should we use, T-test or Chi-squared test? Suppose we want to test whether the prevalence of overweight is different between men and women. We did an investigation and got the estimated overweight prevalence in both men and women groups, how can we compare those two estimated prevalence?

Student’s t-test is a very popular method to compare population means, while in most cases it only works for numeric data because of one of the assumptions of Students’ t-test: The mean follows a normal distribution. In our case, however, we have a binary output for each sample (overweight or not). A more typical method for this situation is Chi-squared test.

Therefore, if we still want to use Students’ t-test to test where there’s a significant difference between the two overweight prevalence, is it correct? We know the percentage of being overweight is a Bernoulli distribution with parameter P. And the number of people who are overweight is a binomial distribution with parameter P and N, where N is the sample size of that group. It has been proved (CLT) that binomial distribution is asymptotic to a normal distribution when NP and N(1-P) are both greater than 5. Thus, the percentage of being overweight is asymptotic to a normal distribution with mean = P and variance = P(1-P).

How big is the difference between the two tests? We did a simulation to compare the test results between Chi-squared test and Student’s t-test. In the function below, we simulate a bunch of samples with binomial distribution, and check whether the two methods can correctly reject the wrong null hypothesis or not reject the true null hypothesis.

sim_prop <- function(N, trial1, trial2 = trial1,
                     theta1, theta2 = theta2) {
  success1 <- rbinom(N, trial1, theta1)
  success2 <- rbinom(N, trial2, theta2)
  output_tab <- data.frame(success1 = success1,
                           success2 = success2,
                           total1 = trial1,
                           total2 = trial2) %>%
    rowwise() %>%
    mutate(p1 = success1/total1,
           p2 = success2/total2,
           se1 = sqrt(p1*(1-p1)/(total1-1)),
           se2 = sqrt(p2*(1-p2)/(total2-1)),
           t_stat = (abs(p1 - p2)/sqrt(se1^2 + se2^2)),
           t_p = 2 * (1 - pt(t_stat, total1+total2-2)),
           chi_p = prop.test(c(success1, success2),
                             c(total1, total2),
                             correct = F)$p.value) %>%
    ungroup() %>%
    summarise(`T-test Reject Percentage` = mean(t_p < 0.05),
              `Chi-squared Reject Percentage` = mean(chi_p < 0.05))

  output_tab
}

First, we check what percentage of 1000 simulations incorrectly reject the true null hypothesis when the true theta is 0.5 in both groups.

# simulate for (0.5, 0.5)
data.frame(group1 = c(10, 15, 20, 40, 100),
           group2 = c(30, 15, 20, 40, 100)) %>%
  rowwise() %>%
  do(
    data.frame(`Group 1 Size` = .$group1, `Group 2 Size` = .$group2,
              sim_prop(1000, .$group1, .$group2, 0.5, 0.5),
              check.names = F) %>%
      bind_rows()

  ) %>%
  kable(caption = "Simulation on True Percentage is 0.5 and 0.5") %>%
  kable_styling(full_width = F, position = "c",
                bootstrap_options = c("striped", "hover"),
                font_size = 15)

Simulation on True Percentage is 0.5 and 0.5
Group 1 Size	Group 2 Size	T-test Reject Percentage	Chi-squared Reject Percentage
10	30	0.054	0.040
15	15	0.053	0.053
20	20	0.051	0.051
40	40	0.056	0.056
100	100	0.058	0.058

Next, We check what percentage of 1000 simulations correctly reject the wrong null hypothesis when the true theta is 0.4 and 0.6 (0.3 and 0.7) in two groups.

# simulate for (0.4, 0.6)
data.frame(group1 = c(10, 15, 20, 40, 100),
           group2 = c(30, 15, 20, 40, 100)) %>%
  rowwise() %>%
  do(
    data.frame(`Group 1 Size` = .$group1, `Group 2 Size` = .$group2,
              sim_prop(1000, .$group1, .$group2, 0.4, 0.6),
              check.names = F) %>%
      bind_rows()

  ) %>%
  kable(caption = "Simulation on True Percentage is 0.4 and 0.6") %>%
  kable_styling(full_width = F, position = "c",
                bootstrap_options = c("striped", "hover"),
                font_size = 15)

Simulation on True Percentage is 0.4 and 0.6
Group 1 Size	Group 2 Size	T-test Reject Percentage	Chi-squared Reject Percentage
10	30	0.229	0.212
15	15	0.179	0.179
20	20	0.226	0.226
40	40	0.455	0.455
100	100	0.827	0.827

# simulate for (0.3, 0.7)
data.frame(group1 = c(10, 15, 20, 40, 100),
           group2 = c(30, 15, 20, 40, 100)) %>%
  rowwise() %>%
  do(
    data.frame(`Group 1 Size` = .$group1, `Group 2 Size` = .$group2,
              sim_prop(1000, .$group1, .$group2, 0.3, 0.7),
              check.names = F) %>%
      bind_rows()

  ) %>%
  kable(caption = "Simulation on True Percentage is 0.3 and 0.7") %>%
  kable_styling(full_width = F, position = "c",
                bootstrap_options = c("striped", "hover"),
                font_size = 15)

Simulation on True Percentage is 0.3 and 0.7
Group 1 Size	Group 2 Size	T-test Reject Percentage	Chi-squared Reject Percentage
10	30	0.612	0.640
15	15	0.599	0.599
20	20	0.704	0.704
40	40	0.957	0.957
100	100	1.000	1.000

As we can see from the simulation result, when the sample sizes are small, t-test is easier to reject the null hypothesis; while when the sample sizes are big enough in both groups, the test results from those two tests tend to be same. However, since the Chi-squared test has more power than the approximated t-test (hope we would talk about “power” in future), we still recommend using Chi-squared test in this case.