String lookup into unknown charset html content

I'm using strpos to lookup for string into web page bodies. 50% it fails, although the search string is present. I have tried to strtolower both search string and searched content, same results. Probabily the problem arises when dealing with different charsets...

Assuming: - search string charset is unknown - searched content charset is unknown - charset could be any ISOxx, UTF-8, Shift-JIS

Is there a bulletproof function to find a substring?

Answers


You could try using mb_detect_encoding to detect the encoding first, then convert to the encoding you would like to use (using iconv or mb_convert_encoding) and search for the pattern in that encoding.


yup convert the html to utf8/latin1 first, grab the content encoding from the Content-Type header or from the meta tag, convert to utf8/latin1 using iconv, then stop worrying about it


Need Your Help

Is there a standard pattern for managing object state in different Cocoa views and menus?

objective-c cocoa observer-pattern

In a Cocoa application is there a standard pattern for keeping UI and other element states in sync? For example I have the main Menu managed by one class and a toolbar managed by another. Some of the