Using grep in R to delete rows from a data.frame
I have a dataframe such as this one:
d <- data.frame(cbind(x=1, y=1:10, z=c("apple","pear","banana","A","B","C","D","E","F","G")), stringsAsFactors = FALSE)
I'd like to delete some rows from this dataframe, depending on the content of column z:
new_d <- d[-grep("D",d$z),]
This works fine; row 7 is now deleted:
new_d x y z 1 1 1 apple 2 1 2 pear 3 1 3 banana 4 1 4 A 5 1 5 B 6 1 6 C 8 1 8 E 9 1 9 F 10 1 10 G
However, when I use grep to search for content that is not present in column z, it seems to delete all content of the dataframe:
new_d <- d[-grep("K",d$z),] new_d [1] x y z <0 rows> (or 0-length row.names)
I would like to search and delete rows in this or another way, even if the character string I am searching for is not present. How to go about this?
Answers
You can use TRUE/FALSE subsetting instead of numeric.
grepl is like grep, but it returns a logical vector. Negation works with it.
d[!grepl("K",d$z),] x y z 1 1 1 apple 2 1 2 pear 3 1 3 banana 4 1 4 A 5 1 5 B 6 1 6 C 7 1 7 D 8 1 8 E 9 1 9 F 10 1 10 G
Here's your problem:
> grep("K",c("apple","pear","banana","A","B","C","D","E","F","G")) integer(0)
Try grepl() instead:
d[!grepl("K",d$z),]
This works because the negated logical vector has an entry for every row:
> grepl("K",d$z) [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE > !grepl("K",d$z) [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
For completeness, since R 3.3.0, grep and friends come with an invert argument:
new_d <- d[grep("K", d$z, invert = TRUE)]
You want to use grepl in this case, e.g., new_d <- d[! grepl("K",d$z),].