Scrape a price off a website

I'm trying to scrape a price from a web page using PHP and Regexes. The price will be in the format £123.12 or $123.12 (i.e., pounds or dollars).

I'm loading up the contents using libcurl. The output of which is then going into preg_match_all. So it looks a bit like this:

$contents = curl_exec($curl);

preg_match_all('/(?:\$|£)[0-9]+(?:\.[0-9]{2})?/', $contents, $matches);

So far so simple. The problem is, PHP isn't matching anything at all - even when there are prices on the page. I've narrowed it down to there being a problem with the '£' character - PHP doesn't seem to like it.

I think this might be a charset issue. But whatever I do, I can't seem to get PHP to match it! Anyone have any ideas?

(Edit: I should note if I try using the Regex Test Tool using the same regex and page content, it works fine)

Answers


Have you try to use \ in front of £

preg_match_all('/(\$|\£)[0-9]+(\.[0-9]{2})/', $contents, $matches);

I have try this expression with .Net with \£ and it works. I just edited it and removed some ":". (source: clip2net.com)

Read my comment about the possibility of Curl giving you bad encoding (comment of this post).


Need Your Help

ObjC: Best use an NSArray or NSDictionnary for this (zBuffer)?

objective-c arrays nsarray hashmap nsdictionary

Say I have a collection of "node" instances. An integer property call zIndex will be used to group them.