# How to perform two-sample t-tests in R by inputting sample statistics rather than the raw data?

Let's say we have the statistics given below

gender mean sd n f 1.666667 0.5773503 3 m 4.500000 0.5773503 4

How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable) using statistics like this rather than actual data?

I couldn't find anywhere on the internet how to do this. Most of the tutorials and even the manual deal with the test with the actual data set only.

## Answers

You can write your own function based on what we know about the mechanics of the two-sample $t$-test. For example, this will do the job:

# m1, m2: the sample means # s1, s2: the sample standard deviations # n1, n2: the same sizes # m0: the null value for the difference in means to be tested for. Default is 0. # equal.variance: whether or not to assume equal variance. Default is FALSE. t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE) { if( equal.variance==FALSE ) { se <- sqrt( (s1^2/n1) + (s2^2/n2) ) # welch-satterthwaite df df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) ) } else { # pooled standard deviation, scaled by the sample sizes se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) ) df <- n1+n2-2 } t <- (m1-m2-m0)/se dat <- c(m1-m2, se, t, 2*pt(-abs(t),df)) names(dat) <- c("Difference of means", "Std Error", "t", "p-value") return(dat) } x1 = rnorm(100) x2 = rnorm(200) # you'll find this output agrees with that of t.test when you input x1,x2 t.test2( mean(x1), mean(x2), sd(x1), sd(x2), 100, 200) Difference of means Std Error t p-value -0.05692268 0.12192273 -0.46687500 0.64113442

You just calculate it by hand: $$ t = \frac{(\text{mean}_f - \text{mean}_m) - \text{expected difference}}{SE} \\ ~\\ ~\\ SE = \sqrt{\frac{sd_f^2}{n_f} + \frac{sd_m^2}{n_m}} \\ ~\\ ~\\ \text{where, }~~~df = n_m + n_f - 2 $$

The expected difference is probably zero.

If you want the p-value simply use the pt() function:

pt(t, df)

Thus, putting the code together:

> p = pt((((1.666667 - 4.500000) - 0)/sqrt(0.5773503/3 + 0.5773503/4)), (3 + 4 - 2)) > p [1] 0.002272053

This assumes equal variances which is obvious because they have the same standard deviation.

You can do the calculations based on the formula in the book (on the web page), or you can generate random data that has the properties stated (see the mvrnorm function in the MASS package) and use the regular t.test function on the simulated data.

The question asks about R, but the issue can arise with any other statistical software. Stata for example has various so-called immediate commands, which allow calculations from summary statistics alone. See http://www.stata.com/manuals13/rttest.pdf for the particular case of the ttesti command, which applies here.