# Using multiple criteria in subset function and logical operators

If I want to select a subset of data in R, I can use the subset function. I wanted to base an analysis on data that that was matching one of a few criteria, e.g. that a certain variable was either 1, 2 or 3. I tried

myNewDataFrame <- subset(bigfive, subset = (bigfive$bf11==(1||2||3)))

It did always just select values that matched the first of the criteria, here 1. My assumption was that it would start with 1 and if it does evaluate to "false" it would go on to 2 and than to 3, and if none matches the statement after == is "false" and if one of them matches, it is "true".

I got the right result using

newDataFrame <- subset(bigfive, subset = (bigfive$bf11==c(1,2,3)))

But I would like to be able to select data via logical operators, so: why did the first approach not work?

## Answers

The correct operator is %in% here. Here is an example with dummy data:

set.seed(1) dat <- data.frame(bf11 = sample(4, 10, replace = TRUE), foo = runif(10))

giving:

> head(dat) bf11 foo 1 2 0.2059746 2 2 0.1765568 3 3 0.6870228 4 4 0.3841037 5 1 0.7698414 6 4 0.4976992

The subset of dat where bf11 equals any of the set 1,2,3 is taken as follows using %in%:

> subset(dat, subset = bf11 %in% c(1,2,3)) bf11 foo 1 2 0.2059746 2 2 0.1765568 3 3 0.6870228 5 1 0.7698414 8 3 0.9919061 9 3 0.3800352 10 1 0.7774452

As to why your original didn't work, break it down to see the problem. Look at what 1||2||3 evaluates to:

> 1 || 2 || 3 [1] TRUE

and you'd get the same using | instead. As a result, the subset() call would only return rows where bf11 was TRUE (or something that evaluated to TRUE).

What you could have written would have been something like:

subset(dat, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

Which gives the same result as my earlier subset() call. The point is that you need a series of single comparisons, not a comparison of a series of options. But as you can see, %in% is far more useful and less verbose in such circumstances. Notice also that I have to use | as I want to compare each element of bf11 against 1, 2, and 3, in turn. Compare:

> with(dat, bf11 == 1 || bf11 == 2) [1] TRUE > with(dat, bf11 == 1 | bf11 == 2) [1] TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE

For your example, I believe the following should work:

myNewDataFrame <- subset(bigfive, subset = bf11 == 1 | bf11 == 2 | bf11 == 3)

See the examples in ?subset for more. Just to demonstrate, a more complicated logical subset would be:

data(airquality) dat <- subset(airquality, subset = (Temp > 80 & Month > 5) | Ozone < 40)

And as Chase points out, %in% would be more efficient in your example:

myNewDataFrame <- subset(bigfive, subset = bf11 %in% c(1, 2, 3))

As Chase also points out, make sure you understand the difference between | and ||. To see help pages for operators, use ?'||', where the operator is quoted.