# break data into each level and apply function to it

I have following dataset:

name1 <- c("P1", "P2", "IndA", "IndB", "IndC", "IndD", "IndE", "IndF", "IndG") name2 <- c("P1", "P2", "IndH", "IndI", "IndJ", "IndK") name3 <- c("P1", "P2", "IndL", "IndM", "IndN") name <- c(name1, name2, name3) A <- c(1, 3, 1, 2, 2, 5, 5, 1, 4, 1, 3, 3, 1, 4, 3, 1, 1, 3,2,1 ) B <- c(2, 4, 3, 4, 2, 2, 6, 2, 2, 1, 4, 3, 1, 1, 5, 2,2, 1, 2, 1 ) family = c(rep(1, length (name1)), rep(2, length (name2)), rep(3, length (name3))) mydf <- data.frame (family, name, A, B)

The following is process I want to apply each level of family variable:

dum.match<-rbind(expand.grid(c(mydf[1,3:4]),c(mydf[2,3:4])), expand.grid(c(mydf[2,3:4]), c(mydf [1,3:4]))) newmydf<-cbind(mydf, correct = paste(mydf$A,mydf$B)%in%paste(dum.match$Var1, dum.match$Var2))

So I generated a function:

err.chk <- function (x) { dum.match<-rbind(expand.grid(c(x[1,3:4]),c(x[2,3:4])), expand.grid(c(x[2,3:4]),c(x[1,3:4]))) newmydf<-cbind(x, correct = paste(x$A,mydf$B)%in%paste(dum.match$Var1, dum.match$Var2)) return (newmydf) }

Now I want to create seperate 3 dataset for each level of family and apply the above function and combine the results into above dataframe with additional column correct. How can I do it ? I tried following (and results are awaful !)

require(plyr) aaply(mydf, 1, err.chk)

**Edit:**

Expected output:

family name A B correct 1 1 P1 1 2 FALSE 2 1 P2 3 4 FALSE 3 1 IndA 1 3 TRUE 4 1 IndB 2 4 TRUE 5 1 IndC 2 2 FALSE 6 1 IndD 5 2 FALSE 7 1 IndE 5 6 FALSE 8 1 IndF 1 2 FALSE 9 1 IndG 4 2 TRUE 10 2 P1 1 1 FALSE 11 2 P2 3 4 FALSE 12 2 IndH 3 3 FALSE 13 2 IndI 1 1 FALSE 14 2 IndJ 4 1 TRUE 15 2 IndK 3 5 FALSE 16 3 P1 1 2 TRUE 17 3 P2 1 2 TRUE 18 3 IndL 3 1 FALSE 19 3 IndM 2 2 TRUE 20 3 IndN 1 1 TRUE

Just for family = 3 (similaly for other datasets)

# just data for family 3 name <- c("P1", "P2", "IndL", "IndM", "IndN") A <- c(1, 1, 3,2,1 ) B <- c(2,2, 1, 2, 1) mydf <- data.frame (name, A, B) err.chk(fam3) name A B correct 16 P1 1 2 TRUE 17 P2 1 2 TRUE 18 IndL 3 1 FALSE 19 IndM 2 2 TRUE 20 IndN 1 1 TRUE

## Answers

Its hard to follow exactly what you're doing, but with plyr you want to use a **ply function that accepts the data type you're giving it and returns the data type your function returns. In this case, ddply looks like the right choice.

If you fix your function in the 3rd line you have a mydf$B which should be x$B:

err.chk <- function (x) { dum.match <- rbind(expand.grid(c(x[1, 2:3]), c(x[2, 2:3])), expand.grid(c(x[2, 2:3]), c(x[1, 2:3]))) newmydf <- cbind(x, correct = paste(x$A, x$B) %in% paste(dum.match$Var1, dum.match$Var2)) return (newmydf) }

Calling it using ddply gives a reasonable looking result.

> ddply(mydf, .(family), err.chk) family name A B correct 1 1 P1 1 2 FALSE 2 1 P2 3 4 FALSE 3 1 IndA 1 3 TRUE 4 1 IndB 2 4 TRUE 5 1 IndC 2 2 FALSE 6 1 IndD 5 2 FALSE 7 1 IndE 5 6 FALSE 8 1 IndF 1 2 FALSE 9 1 IndG 4 2 TRUE 10 2 P1 1 1 FALSE 11 2 P2 3 4 FALSE 12 2 IndH 3 3 FALSE 13 2 IndI 1 1 FALSE 14 2 IndJ 4 1 TRUE 15 2 IndK 3 5 FALSE 16 3 P1 1 2 TRUE 17 3 P2 1 2 TRUE 18 3 IndL 3 1 FALSE 19 3 IndM 2 2 TRUE 20 3 IndN 1 1 TRUE