Regex to remove special/invisible characters

the problem is to remove some strange, characters from domain name, but keep special unicode characters such as accented letters (german, danish of polish language) For example: radisĀ­, you cant see, but there's additional char between ss. (Try to copy to notepad to see it).

I've seen many posts about similar problems, but each solution doesn't remove that special character, or it's removing it, but also other special characters i need to keep.


replace regex [^\w\s.,!@#$%^&*()=+~`-] with empty string

The character you're (not) seeing there is U+00AD Soft Hyphen. You can reference it in a regular expression using \u00ad, e.g.:

Regex.Replace(str, @"\u00ad", "");

But for a single-character replacement you could also use string.Replace as well.

'\xAD' is a soft hyphen (the codepoint's name is "SOFT HYPHEN").

According to the Unicode codepoint database, its category is "Cf" (or "Format"), so it can be matched with the regex @"\p{Cf}".

Strangely, Microsoft Visual C# 2010 Express says that it doesn't match @"\p{Cf}", but instead matches @"\p{Pd}" ("Dash Punctuation"), the same category as the normal hyphen.

This works for me:


