Combine a list of data.tables

Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table)

but it takes a while.



See ?rbindlist and these related questions (easier to find when you know what to search for!) :

Using appears to be about 10x faster with this made up example:


x1 <- data.table(x = runif(1e6), y = runif(1e6))
x2 <- data.table(x = runif(1e6), y = runif(1e6))

#20 data.tables all of length 1e6
yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)

system.time(out1 <- Reduce("rbind", yourList))
   user  system elapsed 
   3.37    3.03    6.43 
system.time(out2 <-"rbind", yourList))
   user  system elapsed 
   0.33    0.36    0.68 
[1] TRUE
Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList))
   user  system elapsed 
   0.07    0.03    0.11 

[1] TRUE

For my money, the plyr package's ldply is the by way to do this. I has the advantage that the name of the list element is added as a new first column, named .id.

In addition, a list of data frames is often the output of tapply, in which case replace the whole shebang with ddply.

Alternatives include"rbind", mylist) or lattice's make.groups (haven't been able to find this one recently though).

Note: I may have misunderstood the question-I read data.frame instead of data.table. These techniques still work, but I'm not sure they result in a data.table all of the time.

