What does `?` mean in this Perl regex?

I have a Perl regex. But I'm not sure what "?" means in this context.

m#(?:\w+)#

What does ? mean here?

Answers


In this case, the ? is actually being used in connection with the :. Put together, ?: at the beginning of a grouping means to group but not capture the text/pattern within the parentheses (as in, it will not be stored in any backreferences like \1 or $1, so you will not be able to access the grouped text directly).

More specifically, a ? has three distinct meanings in regex:

  1. The ? quantifier signifies "zero or one repetitions" of an expression. One of the canonical examples I've seen is s?he which will match both she and he since the ? makes the s "optional"

  2. When a quantifier (+, *, ?, or the general {n,m}) is followed by a ? then the match is non-greedy (i.e. it will match the shortest string starting from that position that allows the match to proceed)

  3. A ? at the beginning of a parenthesized group signifies that you want to perform a special action. As in this case, : means to group but not capture. The exact list of actions available will vary somewhat from one regex engine to another, but here's a list (not necessarily all-inclusive) of some of them:

    A. Non-capturing group: (?:text) B. Lookaround: (?=a) for a lookahead, ?! for negative lookahead, or ?<= and ?<! for lookbehinds (positive and negative, respectively). C. Conditional Matches: (?(condition)then|else). D. Atomic Grouping: a(?>bc|b)c (matches abcc but not abc; see the link) E. Inline enabling/disabling of regex matching modifiers: ?i to enable a mode, ?-i to disable. You can also enable/disable more than one modifier at a time by simply concatenating them, such as ?im (i is case insensitive and m is multiline). F. Named capture groups: (?P<name>pattern), which can later be referenced using (?P=name). The .NET regex engine uses the syntax (?<name>pattern) instead. G. Comments: (?#Comment text). I personally think this just adds clutter, but I guess it could serve some use...free-spacing mode might be a better option (the (?x) modifier).

So essentially, the purpose of the ? is just contextual. If you wanted zero or more repetitions of a literal ( character you'd have to use \(? to escape the paren.


$ perldoc perlreref:

(?:...) Groups subexpressions without capturing (cluster)

You can also use YAPE::Regex::Explain:

C:\\Temp> perl -MYAPE::Regex::Explain -e \ 
"print YAPE::Regex::Explain->new(qr#(?:\w+)#)->explain"

The regular expression:

(?-imsx:(?:\w+))

matches as follows:

NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (?:                      group, but do not capture:
----------------------------------------------------------------------
    \w+                      word characters (a-z, A-Z, 0-9, _) (1 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of grouping
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Those are non-capturing parentheses. They're used for grouping (just like normal parentheses) but the group won't be added to the capture array (i.e. it won't be referenceable with \n).

See here: http://www.regular-expressions.info/refadv.html


In short, the sequence (? starts a regular expression special feature. The things that follow the (? specify the special feature, in this case, a non-capturing grouping. We cover this in both Intermediate Perl and Effective Perl Programming. The perlre documents Perl regular expressions.


See the regex tutorial that is installed with every version of Perl (in particular, this section).


Need Your Help

tslint one line rule misplaced 'else'

typescript tslint

I have such config in tslint.json for one line rule

Facebook SDK login never calls back my application on iOS 9

ios objective-c facebook

I've followed this guide to update my application to use Facebook SDK 4.6 to work properly when built with the iOS 9 SDK.