Determine missing proportions for a set of variables in a data frame by a grouping

I need help on a succint way to determine missing proportions for a set of variables in a data frame by a grouping. Consider for example the Soybean data in the package mlbench.

data(Soybean, package="mlbench")

I would like to compute proportion missing of each of the variables (columns 2 to 36) for each value of Soybean$Class

Ideally the output would look something like the following (the numbers are not real):

Class                   date    plant.stand       precip    ...
2-4-d-injury             0.0            5.1         19.4
alternarialeaf-spot     12.5            2.3          1.2
anthracnose              1.4            0.0         11.2
bacterial-blight         0.3            0.0          0.5  

I have tried the following:

myf <- function(df) {
  apply(df, 2, function(x) sum( / nrow(df) * 100)

by(Soybean, Soybean$Class, function(y) myf(y))

But (i) I don't want to divide by total rows of the dataframe, e.g. nrow(df) is incorrect; and (ii) the output is difficult to digest.

It seems like this is a simple thing to do, and I am afraid I am missing something obvious. I am relatively new to R, and I appreciate any help.


This is fairly straightforward sapply and tapply fodder.

Take this simple example:

dat <- data.frame(

#  Class var1 var2
#1     a    1   NA
#2     a    2   NA
#3     b    3    1
#4     b   NA    2
#5     c    4   NA
#6     c   NA    3

Then try this:

 function(x) {
  tapply(x,dat$Class,FUN=function(y) sum( * 100 )


#  var1 var2
#a    0  100
#b   50    0
#c   50   50

This should work:

pmiss <- function(x) 100 * sum( / length(x)

Soybean %.%
  group_by(Class) %.%
    date = pmiss(date),
    plant.stand = pmiss(plant.stand)

Using data.table you can apply the pmiss function to all columns

DT <- data.table(Soybean)
DT[, lapply(.SD, pmiss), by = Class] 

Need Your Help

converting a Perl regex to python

python regex perl

Im converting some Perl code to python, and I have a regex that works perfectly in Perl, but does not work when I copy it into re.match. The Perl line is: