Union in regular expression in R

I'm trying to use regular expressions in R to find one or more phrases within a vector of long sentences (which I'll call x).

So, for example, this works fine for one phrase:

grep("(phrase 1)",x)

But this doesn't work for two (or more) phrases:

grep("(phrase 1)+(phrase 2)+",x)

As I would expect. As I read it, this last one should give me all matches in x for 1 or more phrase 1s, and 1 or more phrase 2's. But it returns nothing.

Answers


You have to tell it to skip over any intervening characters:

grep("(phrase 1)+.*(phrase 2)+",x)

Also note that it will not reverse the order, so you might have to add that explicitly. Overall, it might be simpler to search each phrase separately (especially if there are more than two phrases), and then combine with intersect and union as you want to get overall results.


Another way

which(grepl("(phrase 1)+",x) & grepl("(phrase 2)+",x))

Full examples (e.g. with, you know, data ...) are always good.

The main key for regexps in R is to remember that there are three (!!) different engines. I tend to like the Perl regexps.

Next, it is important to remember that there are meta-character -- so if you want parens, you need to escape them.

With that, here is an example:

> txt <- c("The grey fox jumped", "The blue cat slept", "The sky was falling")
> grep("blue", txt)                       # finds sentence two
[1] 2
> grep("(grey|blue)", txt, perl=TRUE)     # finds one and two
[1] 1 2
> grep("(red|blue)", txt, perl=TRUE)      # finds only two (as it should)
[1] 2
> 

So with Perl regexps, you list alternatives inside parentheses, separated by a pipe symbol.


There's a way to do it with a single regex using lookaheads, though most regex engines will execute it pretty slowly:

> txt <- c("The grey fox jumped", "The blue cat slept", "The fox is grey", "The cat is grey")
> grep("(?=.*fox)(?=.*grey)", txt, perl=TRUE)
[1] 1 3

Need Your Help

Capturing a Window that is hidden or minimized

winapi screen-capture

I followed this tutorial (there's a bit more than what's listed here because in my code I get a window via mouse click) for grabbing a window as a bitmap and then rendering that bitmap in a different

Make iPad app fit in TV through HDMI

ipad television hdmi

I am working on an iPad application, and I have to give a demo on a television through HDMI, but iPad screen only appears in the middle of the TV screen.