Splitting string into words with swedish chars
I'm trying to split a string with text into words by using the php-function preg_split.
$words = preg_split('/\W/u',$text);
It works fine except for swedish chars lite åäö. Doing utf8_encode or decode doesn't help either. My guess is that preg_split only works with single byte chars and that the swedish chars are multibyte. Is there another way to do it?
Why are you paying any attention to specific characters?
$text = "Jag har hört så mycket om dig."; $words = explode(" ", $text); /* Array (  => Jag  => har  => hört  => så  => mycket  => om  => dig. ) */
mb_split to the rescue (had problems myself with these some time ago, just now found the answer :)
mb_regex_encoding('UTF-8'); mb_split('\W', $text);