How to perform two-sample t-tests in R by inputting sample statistics rather than the raw data?

Let's say we have the statistics given below

gender mean sd n
f 1.666667 0.5773503 3
m 4.500000 0.5773503 4

How do you perform a two-sample t-test (to see if there is a significant difference between the means of men and women in some variable) using statistics like this rather than actual data?

I couldn't find anywhere on the internet how to do this. Most of the tutorials and even the manual deal with the test with the actual data set only.

Answers


You can write your own function based on what we know about the mechanics of the two-sample $t$-test. For example, this will do the job:

# m1, m2: the sample means
# s1, s2: the sample standard deviations
# n1, n2: the same sizes
# m0: the null value for the difference in means to be tested for. Default is 0. 
# equal.variance: whether or not to assume equal variance. Default is FALSE. 
t.test2 <- function(m1,m2,s1,s2,n1,n2,m0=0,equal.variance=FALSE)
{
    if( equal.variance==FALSE ) 
    {
        se <- sqrt( (s1^2/n1) + (s2^2/n2) )
        # welch-satterthwaite df
        df <- ( (s1^2/n1 + s2^2/n2)^2 )/( (s1^2/n1)^2/(n1-1) + (s2^2/n2)^2/(n2-1) )
    } else
    {
        # pooled standard deviation, scaled by the sample sizes
        se <- sqrt( (1/n1 + 1/n2) * ((n1-1)*s1^2 + (n2-1)*s2^2)/(n1+n2-2) ) 
        df <- n1+n2-2
    }      
    t <- (m1-m2-m0)/se 
    dat <- c(m1-m2, se, t, 2*pt(-abs(t),df))    
    names(dat) <- c("Difference of means", "Std Error", "t", "p-value")
    return(dat) 
}
x1 = rnorm(100)
x2 = rnorm(200) 
# you'll find this output agrees with that of t.test when you input x1,x2
t.test2( mean(x1), mean(x2), sd(x1), sd(x2), 100, 200)
Difference of means       Std Error               t         p-value 
        -0.05692268      0.12192273     -0.46687500      0.64113442 

You just calculate it by hand: $$ t = \frac{(\text{mean}_f - \text{mean}_m) - \text{expected difference}}{SE} \\ ~\\ ~\\ SE = \sqrt{\frac{sd_f^2}{n_f} + \frac{sd_m^2}{n_m}} \\ ~\\ ~\\ \text{where, }~~~df = n_m + n_f - 2 $$

The expected difference is probably zero.

If you want the p-value simply use the pt() function:

pt(t, df)

Thus, putting the code together:

> p = pt((((1.666667 - 4.500000) - 0)/sqrt(0.5773503/3 + 0.5773503/4)), (3 + 4 - 2))
> p
[1] 0.002272053

This assumes equal variances which is obvious because they have the same standard deviation.


You can do the calculations based on the formula in the book (on the web page), or you can generate random data that has the properties stated (see the mvrnorm function in the MASS package) and use the regular t.test function on the simulated data.


The question asks about R, but the issue can arise with any other statistical software. Stata for example has various so-called immediate commands, which allow calculations from summary statistics alone. See http://www.stata.com/manuals13/rttest.pdf for the particular case of the ttesti command, which applies here.


Need Your Help

The undo feature in the Eclipse ADT is not working

android eclipse

I download the Eclipse with the ADT for developing the Android project.

How to add icon in RTF format?

java swt eclipse-rcp rtf

I am developing SWT widget and I have build drag and drop functionality.