Remove substring from a list of strings

I have a list of strings that contain banned words. What's an efficient way of checking if a string contains any of the banned words and removing it from the string? At the moment, I have this:

cleaned = String.Join(" ", str.Split().Where(b => !bannedWords.Contains(b,
                            StringComparer.OrdinalIgnoreCase)).ToArray());

This works fine for single banned words, but not for phrases (e.g. more than one word). Any instance of more than one word should also be removed. An alternative I thought of trying is to use the List's Contains method, but that only returns a bool and not an index of the matching word. If I could get an index of the matching word, I could just use String.Replace(bannedWords[i],"");

Answers


A simple String.Replace will not work as it will remove word parts. If "sex" is a banned word and you have the word "sextet", which is not banned, you should keep it as is.

Using Regex you can find whole words and phrases in a text with

string text = "A sextet is a musical composition for six instruments or voices.".
string word = "sex";
var matches = Regex.Matches(text, @"(?<=\b)" + word + @"(?=\b)");

The matches collection will be empty in this case.

You can use the Regex.Replace method

foreach (string word in bannedWords) {
    text = Regex.Replace(text, @"(?<=\b)" + word + @"(?=\b)", "")
}

Note: I used the following Regex pattern

(?<=prefix)find(?=suffix)

where 'prefix' and 'suffix' are both \b, which denotes word beginnings and ends.

If your banned words or phrases can contain special characters, it would be safer to escape them with Regex.Escape(word).


Using @zmbq's idea you could create a Regex pattern once with

string pattern =
    @"(?<=\b)(" +
    String.Join(
        "|",
        bannedWords
            .Select(w => Regex.Escape(w))
            .ToArray()) +
     @")(?=\b)";
var regex = new Regex(pattern); // Is compiled by default

and then apply it repeatedly to different texts with

string result = regex.Replace(text, "");

It doesn't work because you have conflicting definitions.

When you want to look for sub-sentences like more than one word you cannot split on whitespace anymore. You'll have to fall back on String.IndexOf()


Need Your Help

show rest of a form if a checkbox is ckecked in ruby on rails

ruby-on-rails forms checkbox checkedtextview

I need to ask to my user if will pay a service with credit card...if it checked the option pay_with_card? it must show the rest of the form, that ask for other data like card number, mail, etc. if ...

C# Windows Form ComboBox Drop Down Contents

c# xml winforms combobox guid

I have a XML file with my data in it. I want to populate the drop down options in a combobox with 2 of the fields in that xml file - FirstName and LastName.