Simple frequency tables using data.table
I'm looking for a way to do simple aggregates / counts via data.table.
Consider the iris data, which has 50 observations per species. To count the observations per species I have to summaries over a column other than species, for example "Sepal.Length".
library(data.table) dt = as.data.table(iris) dt[,length(Sepal.Length), Species]
I find this confusing because it looks like I'm doing something on Sepal.Length at first glance, when really it's only Species that matters.
This is what I would prefer to say, but I don't get valid output:
Correct input and output, but clunky code:
> dt[,length(Sepal.Length), Species] Species V1 1: setosa 50 2: versicolor 50 3: virginica 50
Incorrect input and output, but nicer code:
> dt[,length(Species), Species] Species V1 1: setosa 1 2: versicolor 1 3: virginica 1
Is there an elegant way around this?
data.table has a couple of symbols that can be used within the j expression. Notably
- .N will give you the number of number of rows in each group.
see ?data.table under the details for by
Advanced: When grouping by by or by i, symbols .SD, .BY and .N may be used in the j expression, defined as follows.
.N is an integer, length 1, containing the number of rows in the group.
dt[, .N ,by = Species] Species N 1: setosa 50 2: versicolor 50 3: virginica 50