Aggregating one matrix by values in another matrix

As I am wont to do, I'm keeping tabs on my cats using matrices.

catWeights <- cbind(fluffy=c(5.0,5.1,5.2,5.3),misterCuddles=c(1.2,1.3,1.4,1.5),captainMew=c(4.3,4.2,4.1,4.0))
catTypes <- cbind(fluffy=c('cat','cat','cat','cat'),misterCuddles=c('kitten','kitten','kitten','cat'),captainMew=c('cat','cat','cat','cat'))
dates <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04")
row.names(catWeights) <- dates
row.names(catTypes) <- dates

On any date, I know how much each of them weigh:

> catWeights
           fluffy misterCuddles captainMew
2013-01-01    5.0           1.2        4.3
2013-01-02    5.1           1.3        4.2
2013-01-03    5.2           1.4        4.1
2013-01-04    5.3           1.5        4.0

And I know whether they're cats or kittens:

> catTypes
           fluffy misterCuddles captainMew
2013-01-01 "cat"  "kitten"      "cat"     
2013-01-02 "cat"  "kitten"      "cat"     
2013-01-03 "cat"  "kitten"      "cat"     
2013-01-04 "cat"  "cat"         "cat"  

How can I tell how much all my cats and all my kittens weigh through time?

I want this:

> totalWeights

             cat    kitten
2013-01-01   9.3       1.2
2013-01-02   9.3       1.3
2013-01-03   9.3       1.4
2013-01-04  10.8       0.0

On the fourth of January, Mister Cuddles turned 1, so he was no longer a kitten. His weight moved from the kitten bucket to the cat bucket.

Answers


This seems valid using the sample data:

do.call(cbind, 
      lapply(c("cat", "kitten"), 
            function(x) rowSums(catWeights * (catTypes == x))))
#           [,1] [,2]
#2013-01-01  9.3  1.2
#2013-01-02  9.3  1.3
#2013-01-03  9.3  1.4
#2013-01-04 10.8  0.0

EDIT:

As @BlueMagister commented ... lapply(unique(as.vector(catTypes)), ... is the more general form of the answer. I guess, though, you've already found a way to overcome this, since you accepted the answer. The as.vector is because unique has a matrix method that is not convenient in this specific case.

Also, since I 'm in editing mode, I'll note that sapply could 've been used, but based on some rough benchmarks I 've made from time to time, I 've found lapply to be faster even if it is accompanied by a do.call(r/cbind, ..) or a unlist. I did not test it for a larger dataset in this specific case, though.

So, another format of the answer could've been:

sapply(unique(as.vector(catTypes)), 
             function(x) rowSums(catWeights * (catTypes == x)))

Here is a not very general answer that applies only to the example data set.

# Construct matrices for the cat weights and kitten weights
catWts <- ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)
kittenWts <- ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)

# Well, then just take the row sums for the two matrices
catSums <- rowSums(catWts)
kittenSums <- rowSums(kittenWts)

# Then combine it to a data frame
totalWeights <- data.frame(cat=catSums, kitten=kittenSums)

# In one line
data.frame(cat=rowSums(ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)),
           kitten=rowSums(ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0)))

#            cat kitten
#2013-01-01  9.0    1.3
#2013-01-02 10.1    1.4
#2013-01-03 10.3    1.2
#2013-01-04 14.6    0.0

I would imagine that there is a more general approach to solving this problem.


Microbenchmarking alexis_laz's two solutions on a 2500x2500 matrix with 10 groups:

> microbenchmark(cbindLapply(), sapplyOnly(), times=100)
Unit: milliseconds
          expr      min       lq   median       uq      max neval
 cbindLapply() 841.4796 865.2220 879.9099 892.6265 990.5915   100
  sapplyOnly() 846.3675 869.7372 879.0286 901.3314 979.6136   100

Need Your Help

SDCAlertView and UIViewController with UIButton

ios objective-c uigesturerecognizer sdcalertview

I have to display a custom iOS 7-style alert requester with a set of custom buttons (category filter toggles to be specific) in the center. For this, I've found the excellent SDCAlertView on GitHub...

Obtaining the server IP address in WCF?

.net wcf

How can I obtain the server address that was used for a request when it has multiple IP addresses?