Selecting columns in R data frame based on those *not* in a vector

I'm familiar with being able to extract columns from an R data frame (or matrix) like so:

df.2 <- df[, c("name1", "name2", "name3")]

But can one use a ! or other tool to select all but those listed columns?

For background, I have a data frame with quite a few column vectors and I'd like to avoid:

  • Typing out the majority of the names when I could just remove a minority
  • Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the numbering isn't the same anymore. I'm new to R and think I've learned the hard way not to use number vectors for larger df's that might change.

I tried:

df.2 <- df[, !c("name1", "name2", "name3")]
df.2 <- df[, !=c("name1", "name2", "name3")]

And just as I was typing this, found out that this works:

df.2 <- df[, !names(df) %in% c("name1", "name2", "name3")]

Is there a better way than this last one?

Answers


An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )] 

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.


dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.


Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.


The easiest way that comes to my mind:

filtered_df<-df[, setdiff(names(df),c("name1","name2") ]

essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).


Need Your Help

Show undefined variable errors in Django templates?

django django-templates

How can I ask Django to tell me when it encounters, for example, an undefined variable error while it's rendering templates?

Multiline YAML string for GitLab CI (.gitlab-ci.yml)

yaml gitlab-ci gitlab-ci-runner

I'm trying to write a gitlab-ci.yml file which uses a multi-line string for the command. However, it seems like it is not being parsed. I've tried both the - | and - &gt; with identical results.