PHP Regular expression help to work with Browser Agent String
I'm currently trying to learn regular expressions with some simple "real world" examples.
Take in consideration the following string:
Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko
I want to find the RV value (1.9.2a1pre). I need to apply the following rules:
- RV: can be in any case (RV, rv, rV, Rv...).
- RV: can be anywhere in the string.
- The RV: value ends with either a closing parenthesis, any whitespace (including linebreak), a semicolon or the end of string.
So far I did:
but it's not working (I must be far from the "true" solution)...
The expression must work with PHP preg_match.
will match rv, followed by a : (which may be surrounded with whitespace), then a run of characters other than ;, ) and whitespace (including newlines). The match result (after rv:) will be captured in backreference no. 1.
Here is my revision to allow the RV sub-string to be anywhere
- () denotes the capture group (ie, what you want to get back from this process)
- [^); ] means characters that are not ), *space* or ;
- + means one or more times
- * means as many as you like, 0-many.
- [\s]* just before the parenthesis basically means we chop off any leading whitespace from the match, essential in this case because we're explicitly saying we break the main match on a space.
So this is looking to capture a string of chars excluding ) one or more chars in length, immediately after rv:.
Your version /rv:[.][\)]?/i looks for a single . then optionally a ).
I think the [.] means a dot, not "any character" ... use this instead:
$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko'; preg_match('/rv:([^\)]*)/i', $str , $matches); echo $matches;
may be :
that means NO ) ; space line-feed one or more time case insensitive and captured
I think what you want is this:
everything within parentheses is a group. this ?<= is called a positive lookbehind. it basically matches a string before the string you want. this ?= is called a positive lookahead and matches a string after the string you want. since the string you want is simply numbers, letters and a decimal or two, the . operator works as a catchall and matches any character except line breaks. * indicates one or more of the previous characters.
hope that helps
$str = 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.9.2a1pre) Gecko'; preg_match('/rv:([a-z0-9\.])*/im', $str , $matches); echo $matches;