Regex to remove special/invisible characters

the problem is to remove some strange, characters from domain name, but keep special unicode characters such as accented letters (german, danish of polish language) For example: radisĀ­son-blu.es, you cant see, but there's additional char between ss. (Try to copy to notepad to see it).

I've seen many posts about similar problems, but each solution doesn't remove that special character, or it's removing it, but also other special characters i need to keep.

Answers


replace regex [^\w\s.,!@#$%^&*()=+~`-] with empty string


The character you're (not) seeing there is U+00AD Soft Hyphen. You can reference it in a regular expression using \u00ad, e.g.:

Regex.Replace(str, @"\u00ad", "");

But for a single-character replacement you could also use string.Replace as well.


'\xAD' is a soft hyphen (the codepoint's name is "SOFT HYPHEN").

According to the Unicode codepoint database, its category is "Cf" (or "Format"), so it can be matched with the regex @"\p{Cf}".

Strangely, Microsoft Visual C# 2010 Express says that it doesn't match @"\p{Cf}", but instead matches @"\p{Pd}" ("Dash Punctuation"), the same category as the normal hyphen.


This works for me:

[\x00-\x1f]|[\x81\x8d\x8d\x8f\x90\x9d\xa0\u2060\uFEFF]

Need Your Help

Need Assistance With DB2 SQL Query

sql db2

I'm working on a DB2 Query that I need assistance with. If someone is willing to take a look I'd really appreciate it.

Unit testing custom FxCop rules

c# unit-testing fxcop

I would like to test my custom fxrules.