String lookup into unknown charset html content
I'm using strpos to lookup for string into web page bodies. 50% it fails, although the search string is present. I have tried to strtolower both search string and searched content, same results. Probabily the problem arises when dealing with different charsets...
Assuming: - search string charset is unknown - searched content charset is unknown - charset could be any ISOxx, UTF-8, Shift-JIS
Is there a bulletproof function to find a substring?
You could try using mb_detect_encoding to detect the encoding first, then convert to the encoding you would like to use (using iconv or mb_convert_encoding) and search for the pattern in that encoding.
yup convert the html to utf8/latin1 first, grab the content encoding from the Content-Type header or from the meta tag, convert to utf8/latin1 using iconv, then stop worrying about it