How to increase efficiency of a slow loop

I have a time series data of daily deaths and various environmental factors with about 8000 data points, 11 outcomes and 6 pollutants. There was no convergence problem when I run the models individually, but it took about 20 minutes to run each. In loop the entire model has never completed and terminated for unidentified reason. The code shown below was executed for two outcomes and two pollutants and the elapsed time was 4123.59.

I am working on a fairly fast windows PC with 4 processors and 16 GB of RAM and still the whole process is painfully slow. I would appreciate any suggestion on how to make the code efficient and fast. I have checked some related answers of forum, but none were applicable to my specific problem.

A code run on my data

outcome<-c("cardva" ,"respir") 
pollut1<-c("o3","no2")

m1 <- lapply(outcome, function(o){
  lapply(pollut1,function(v) {
    f<- sprintf("%s ~  s(trend,k=21*50,fx=F,bs='cr')+ s(temp,k=6,fx=F,bs='cr') + rh + 
   as.factor(dow) + s(fluepi,k=4,fx=F, bs='cr') + as.factor(holiday) + %s",o, v)
    gam(as.formula(f),family=quasipoisson, na.action=na.omit,data=mortdf)

  })
})

Sample code and data is shown below:

library(quantmod)
 library(mgcv)
 library(dlnm) 
 df <- chicagoNMMAPS
 outcome<- c("death", "cvd", "resp ")
 pollut1<-c("pm10" , "o3" ) 

 ptm <- proc.time()

 mod1<- lapply(outcome, function(o){
   lapply(pollut1,function(v) {
     f <- sprintf("%s~ s(time,bs='cr',k=14*50)+ s(temp,k=6, bs='cr') + as.factor(dow)   + %s",o, v) 
     gam(as.formula(f),family=quasipoisson,na.action=na.omit,data=df) 
  })}) 

 proc.time() - ptm
 user  system elapsed 
 991.02    8.89 1002.00

Answers


Do you really need 14*50 = 700 dimensions in your basis set? That's the reason it is taking so long. It looks like t ~ O(k^2).

# 700 dimensions: 8 minutes, gcv = 1.22
f = death~ s(time,bs='cr',k=14*50)+ s(temp, bs='cr', k=6) + as.factor(dow)+pm10
system.time(g <- gam(f,family=quasipoisson,na.action=na.omit,data=df)) 
   user  system elapsed 
 457.66    2.17  461.90 
g$gcv
[1] 1.222779

# 200 dimentsions: 48 seconds; gcv = 1.25
f.new = death~ s(time,bs='cr',k=200)+ s(temp, bs='cr', k=6) + as.factor(dow)+pm10
system.time(g <- gam(f.new,family=quasipoisson,na.action=na.omit,data=df)) 
   user  system elapsed 
  47.93    0.07   48.04 
g$gcv
[1] 1.252921

# 100 dimensions: 15 seconds, gcv - 1.30
f.new = death~ s(time,bs='cr',k=100)+ s(temp, bs='cr', k=6) + as.factor(dow)+pm10
system.time(g <- gam(f.new,family=quasipoisson,na.action=na.omit,data=df))
   user  system elapsed 
  15.31    0.05   15.39 
g$gcv
[1] 1.297332

You can use one of the various R packages to run lapply in parallel on multiple processors. See for example sfLapply() from the snowfall package. If you were using Linux you could also easily use mclapply() instead of lapply()


I'm not familiar with the functionality you're using so what I've produced doesn't actually work (mainly - trying to use the outcomes etc doesn't seem to exist in the sample dataset), but it's hopefully illsutrative.

doParallel (doMC could be used for linux) is like the snowfall package in @f3lix's answer and foreach provides some excellent parallel iterators. I've prepared a list of combinations for checking used foreach here because I think it is a simpler system to use than nesting lapply. Hopefully, it will provide you with some useful material for doing this calculation.

library(quantmod)
 library(mgcv)
 library(dlnm) 
 df <- chicagoNMMAPS
 outcome<- c("death", "cvd", "resp ")
 pollut1<-c("pm10" , "o3" ) 

library("doParallel")
library("foreach")
registerDoParallel(cores=8)

combinations<-expand.grid(outcome,pollut1)
mod1<- foreach(o=combinations, .combine='list') %dopar% {
    f <- as.formula(paste0("~ s(time,bs='cr',k=14*50)+ s(temp,k=6, bs='cr') + as.factor(dow) + ",o["Var1"], o["Var2"]) )
    gam(f,family=quasipoisson,na.action=na.omit,data=df) 
}

http://cran.r-project.org/web/packages/doParallel/vignettes/gettingstartedParallel.pdf


Need Your Help

Detect Jquery img src update in knockout Model

jquery knockout.js

We need to update img src with Jquery but that change is not being detected in Knockout observable property.

How to make simple control of MPMoviePlayerController

iphone ios overlay mpmovieplayercontroller

I'm making app using MPMoviePlayerController. This capture is pic of a my reference app. I want to make like red boxed part. - only Done button and progress bar. How can I make like this?