Combine a list of data.tables

Is there a specific method for combining a list of data.tables in R?

I have a list of ~20 data.tables, each with around 1 million rows, and would like to combine them into one data.table with 20 million rows.

I've been doing it with

Reduce('rbind', data.table)

but it takes a while.

Tnx!

Answers


See ?rbindlist and these related questions (easier to find when you know what to search for!) :

data.table questions and answers containing rbindlist


Using do.call appears to be about 10x faster with this made up example:

library(data.table)

x1 <- data.table(x = runif(1e6), y = runif(1e6))
x2 <- data.table(x = runif(1e6), y = runif(1e6))

#20 data.tables all of length 1e6
yourList <- list(x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2,x1,x2)

system.time(out1 <- Reduce("rbind", yourList))
#-----
   user  system elapsed 
   3.37    3.03    6.43 
system.time(out2 <- do.call("rbind", yourList))
#-----
   user  system elapsed 
   0.33    0.36    0.68 
all.equal(out1,out2)
#-----
[1] TRUE
Edit - to incorporate Matt's answer

I did not realize data.table had a specific function for this task. Par for the course, it is quite fast. Here is the relevant timing:

system.time(out3 <- rbindlist(yourList))
#-----
   user  system elapsed 
   0.07    0.03    0.11 

all.equal(out1,out3)
#-----
[1] TRUE

For my money, the plyr package's ldply is the by way to do this. I has the advantage that the name of the list element is added as a new first column, named .id.

In addition, a list of data frames is often the output of tapply, in which case replace the whole shebang with ddply.

Alternatives include do.call("rbind", mylist) or lattice's make.groups (haven't been able to find this one recently though).


Note: I may have misunderstood the question-I read data.frame instead of data.table. These techniques still work, but I'm not sure they result in a data.table all of the time.


Need Your Help

Move Git LFS tracked files under regular Git

git git-lfs

I have a project where I stored video files with Git LFS. Now I ran into some complications with my build server that doesn't yet support Git LFS. As it's an external service, I can't really affect...

Eclipse project was deleted how to undo it? or recover it

eclipse project recovery

I have to recover a deleted Eclipse blackberry project.