Combine a list of data.tables
Is there a specific method for combining a list of data.tables in R?
I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.
I've been doing it with
but it takes a while.
See ?rbindlist and these related questions (easier to find when you know what to search for!) :
Using do.call appears to be about 10x faster with this made up example:
library(data.table) x1 <- data.table(x = runif(1e6), y = runif(1e6)) x2 <- data.table(x = runif(1e6), y = runif(1e6)) #20 data.tables all of length 1e6 yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2) system.time(out1 <- Reduce("rbind", yourList)) #----- user system elapsed 3.37 3.03 6.43 system.time(out2 <- do.call("rbind", yourList)) #----- user system elapsed 0.33 0.36 0.68 all.equal(out1,out2) #-----  TRUE
Edit - to incorporate Matt's answer
I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:
system.time(out3 <- rbindlist(yourList)) #----- user system elapsed 0.07 0.03 0.11 all.equal(out1,out3) #-----  TRUE
For my money, the plyr package's ldply is the by way to do this. I has the advantage that the name of the list element is added as a new first column, named .id.
In addition, a list of data frames is often the output of tapply, in which case replace the whole shebang with ddply.
Alternatives include do.call("rbind", mylist) or lattice's make.groups (haven't been able to find this one recently though).
Note: I may have misunderstood the question-I read data.frame instead of data.table. These techniques still work, but I'm not sure they result in a data.table all of the time.