How can I check whether data has an equal number of observations per group?

I'm writing some code where I need to check whether all group sizes for a given input of data are equal. For example, suppose I wanted to know whether the "mpg" dataset (in the ggplot2 package) has:

  • Equal numbers of cars for every manufacturer
  • Equal numbers of cars for each type of drive (4-wheel, front-wheel, rear-wheel)
  • Equal numbers of cars for each engine type (4-cylinder, 6-cylinder, 8-cylinder)

For data like mpg, some of those questions can be answered by inspecting the summary output

library(ggplot2)   # contains the mpg dataset
summary(mpg$drive) # shows the breakdown of cars by drive-type, 
                   # which we can verify is unequal

But I feel like I'm missing an easy way to check whether group sizes are equal. Is there some single, mythical function I can call like are.groups.of.equal.size(x)? Or another base function (or composition of them) that would return such information?

Answers


As Joran said we could invent 100s of ways from here till Christmas on how to do this one. I smell a microbenchmark challenge:

are.groups.of.equal.size <- function(x) {
    y <- rle(as.character(sort(x)))$lengths
    all(y%in%mean(y))
}


are.groups.of.equal.size(c(3, 3, 3))
are.groups.of.equal.size(mtcars$cyl)
are.groups.of.equal.size(CO2$Plant)
are.groups.of.equal.size(mtcars$carb)

Here is one way of doing it:

are.groups.of.equal.size <- function(x)length(unique(table(x))) == 1L

are.groups.of.equal.size(mpg$manufacturer)
# [1] FALSE
are.groups.of.equal.size(mpg$drv)
# [1] FALSE
are.groups.of.equal.size(mpg$year)
# [1] TRUE

Note that if needed, table has options for how to handle NAs in your data.


Need Your Help

Looking for missed IDs in SQL Server 2008

sql-server-2008 tsql

I have a table that contains two columns

Windows - auto launch specific applications on boot

windows node.js executable boot

Developing a kiosk application in Windows OS. When the machine boots up 3 things have to happen in sequence