How do I strip non-alphanumeric characters (including spaces) from a string?

How do I strip non alphanumeric characters from a string and loose spaces in C# with Replace?

I want to keep a-z, A-Z, 0-9 and nothing more (not even " " spaces).

"Hello there(hello#)".Replace(regex-i-want, "");

should give

"Hellotherehello"

I have tried "Hello there(hello#)".Replace(@"[^A-Za-z0-9 ]", ""); but the spaces remain.

Answers


In your regex, you have excluded the spaces from being matched (and you haven't used Regex.Replace() which I had overlooked completely...):

result = Regex.Replace("Hello there(hello#)", @"[^A-Za-z0-9]+", "");

should work. The + makes the regex a bit more efficient by matching more than one consecutive non-alphanumeric character at once instead of one by one.

If you want to keep non-ASCII letters/digits, too, use the following regex:

@"[^\p{L}\p{N}]+"

which leaves

BonjourmesélèvesGutenMorgenliebeSchüler

instead of

BonjourmeslvesGutenMorgenliebeSchler

You can use Linq to filter out required characters:

  String source = "Hello there(hello#)";

  // "Hellotherehello"
  String result = new String(source
    .Where(ch => Char.IsLetterOrDigit(ch))
    .ToArray());

Or

  String result = String.Concat(source
    .Where(ch => Char.IsLetterOrDigit(ch)));  

And so you have no need in regular expressions.


Or you can do this too:

    public static string RemoveNonAlphanumeric(string text)
    {
        StringBuilder sb = new StringBuilder(text.Length);

        for (int i = 0; i < text.Length; i++)
        {
            char c = text[i];
            if (c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                sb.Append(text[i]);
        }

        return sb.ToString();
    }

Usage:

string text = SomeClass.RemoveNonAlphanumeric("text LaLa (lol) á ñ $ 123 ٠١٢٣٤");

//text: textLaLalol123

The mistake made above was using Replace incorrectly (it doesn't take regex, thanks CodeInChaos).

The following code should do what was specified:

Regex reg = new Regex(@"[^\p{L}\p{N}]+");//Thanks to Tim Pietzcker for regex
string regexed = reg.Replace("Hello there(hello#)", "");

This gives:

regexed = "Hellotherehello"

And as a replace operation as an extension method:

public static class StringExtensions
{
    public static string ReplaceNonAlphanumeric(this string text, char replaceChar)
    {
        StringBuilder result = new StringBuilder(text.Length);

        foreach(char c in text)
        {
            if(c >= 'a' && c <= 'z' || c >= 'A' && c <= 'Z' || c >= '0' && c <= '9')
                result.Append(c);
            else
                result.Append(replaceChar);
        }

        return result.ToString();
    } 
}

And test:

[TestFixture]
public sealed class StringExtensionsTests
{
    [Test]
    public void Test()
    {
        Assert.AreEqual("text_LaLa__lol________123______", "text LaLa (lol) á ñ $ 123 ٠١٢٣٤".ReplaceNonAlphanumeric('_'));
    }
}

var text = "Hello there(hello#)";

var rgx = new Regex("[^a-zA-Z0-9]");

text = rgx.Replace(text, string.Empty);

Use following regex to strip those all characters from the string using Regex.Replace

([^A-Za-z0-9\s])

In .Net 4.0 you can use the IsNullOrWhitespace method of the String class to remove the so called white space characters. Please take a look here http://msdn.microsoft.com/en-us/library/system.string.isnullorwhitespace.aspx However as @CodeInChaos pointed there are plenty of characters which could be considered as letters and numbers. You can use a regular expression if you only want to find A-Za-z0-9.


Need Your Help

Data structures used to build file systems?

data-structures operating-system filesystems

What Data Structure is best to use for file organization? Are B-Trees the best or is there another data structure which obtains faster access to files and good organization? Thanks