How to pass na.rm as argument to tapply?
I´d like to calculate mean and sd from a dataframe with one column for the parameter and one column for a group identifier. How can I calculate them when using tapply? I could use sd(v1, group, na.rm=TRUE), but can´t fit the na.rm=TRUE into the statement when using tapply. omit.na is no option. I have a whole bunch of parameters and have to go through them step by step without losing half of the dataframe when excluding all lines with one missing value.
data("weightgain", package = "HSAUR") tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean)
The same holds true for the by statement.
x<-c(1,2,3,4,5,6,7,8,9,NA) y<-c(2,3,NA,3,4,NA,2,3,NA,2) group<-rep((factor(LETTERS[1:2])),5) df<-data.frame(x,y,group) df by(df$x,df$group,summary) by(df$x,df$group,mean) sd(df$x) #result: NA sd(df$x, na.rm=TRUE) #result: 2.738613
Any ideas how to get this done?
Answers
I think this should do what you want.
Select the columns you want:
v = c("x", "y")#or v = colnames(df)[1:2]
Use sapply to iterate over v and pass the values to tapply:
sapply(v, function(i) tapply(df[[i]], df$group, sd, na.rm=TRUE))
Simply set na.rm=TRUE in the tapply function:
tapply(weightgain$weightgain, list(weightgain$source, weightgain$type), mean, na.rm=TRUE)