# Aggregating one matrix by values in another matrix

As I am wont to do, I'm keeping tabs on my cats using matrices.

catWeights <- cbind(fluffy=c(5.0,5.1,5.2,5.3),misterCuddles=c(1.2,1.3,1.4,1.5),captainMew=c(4.3,4.2,4.1,4.0)) catTypes <- cbind(fluffy=c('cat','cat','cat','cat'),misterCuddles=c('kitten','kitten','kitten','cat'),captainMew=c('cat','cat','cat','cat')) dates <- c("2013-01-01", "2013-01-02", "2013-01-03","2013-01-04") row.names(catWeights) <- dates row.names(catTypes) <- dates

On any date, I know how much each of them weigh:

> catWeights fluffy misterCuddles captainMew 2013-01-01 5.0 1.2 4.3 2013-01-02 5.1 1.3 4.2 2013-01-03 5.2 1.4 4.1 2013-01-04 5.3 1.5 4.0

And I know whether they're cats or kittens:

> catTypes fluffy misterCuddles captainMew 2013-01-01 "cat" "kitten" "cat" 2013-01-02 "cat" "kitten" "cat" 2013-01-03 "cat" "kitten" "cat" 2013-01-04 "cat" "cat" "cat"

How can I tell how much all my cats and all my kittens weigh through time?

I want this:

> totalWeights cat kitten 2013-01-01 9.3 1.2 2013-01-02 9.3 1.3 2013-01-03 9.3 1.4 2013-01-04 10.8 0.0

On the fourth of January, Mister Cuddles turned 1, so he was no longer a kitten. His weight moved from the kitten bucket to the cat bucket.

## Answers

This seems valid using the sample data:

do.call(cbind, lapply(c("cat", "kitten"), function(x) rowSums(catWeights * (catTypes == x)))) # [,1] [,2] #2013-01-01 9.3 1.2 #2013-01-02 9.3 1.3 #2013-01-03 9.3 1.4 #2013-01-04 10.8 0.0

EDIT:

As @BlueMagister commented ... lapply(unique(as.vector(catTypes)), ... is the more general form of the answer. I guess, though, you've already found a way to overcome this, since you accepted the answer. The as.vector is because unique has a matrix method that is not convenient in this specific case.

Also, since I 'm in editing mode, I'll note that sapply could 've been used, but based on some rough benchmarks I 've made from time to time, I 've found lapply to be faster even if it is accompanied by a do.call(r/cbind, ..) or a unlist. I did not test it for a larger dataset in this specific case, though.

So, another format of the answer could've been:

sapply(unique(as.vector(catTypes)), function(x) rowSums(catWeights * (catTypes == x)))

Here is a not very general answer that applies only to the example data set.

# Construct matrices for the cat weights and kitten weights catWts <- ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0) kittenWts <- ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0) # Well, then just take the row sums for the two matrices catSums <- rowSums(catWts) kittenSums <- rowSums(kittenWts) # Then combine it to a data frame totalWeights <- data.frame(cat=catSums, kitten=kittenSums) # In one line data.frame(cat=rowSums(ifelse(catTypes=="cat", catWeights[catTypes=="cat"], 0)), kitten=rowSums(ifelse(catTypes=="kitten", catWeights[catTypes=="kitten"], 0))) # cat kitten #2013-01-01 9.0 1.3 #2013-01-02 10.1 1.4 #2013-01-03 10.3 1.2 #2013-01-04 14.6 0.0

I would imagine that there is a more general approach to solving this problem.

Microbenchmarking alexis_laz's two solutions on a 2500x2500 matrix with 10 groups:

> microbenchmark(cbindLapply(), sapplyOnly(), times=100) Unit: milliseconds expr min lq median uq max neval cbindLapply() 841.4796 865.2220 879.9099 892.6265 990.5915 100 sapplyOnly() 846.3675 869.7372 879.0286 901.3314 979.6136 100