Regex replacements inside a StringBuilder

I'm writing the contents of a text file to a StringBuilder and I then want to perform a number of find/replace actions on the text contained in the StringBuilder using regular expressions.

I've run into a problem as the StringBuilder replace function is not capable of accepting regular expression arguments.

I could use Regex.Replace on a normal string but I'm under the impression that this is inefficient due to the fact that two copies of the string will need to be created in memory as .net strings are immutable.

Once I've updated the text I plan to write it back to the original file.

What's the best and most efficient way to solve my problem?

EDIT

In addition to the answer(s) below, I've found the following questions that also shed some light on my problem -

Answers


The best and most efficient solution for your time is to try the simplest approach first: forget the StringBuilder and just use Regex.Replace. Then find out how slow it is - it may very well be good enough. Don't forget to try the regex in both compiled and non-compiled mode.

If that isn't fast enough, consider using a StringBuilder for any replacements you can express simply, and then use Regex.Replace for the rest. You might also want to consider trying to combine replacements, reducing the number of regexes (and thus intermediate strings) used.


You have 3 options:

  1. Do this in an inefficient way with strings as others have recommended here.

  2. Use the .Matches() call on your Regex object, and emulate the way .Replace() works (see #3).

  3. Adapt the Mono implementation of Regex to build a Regex that accepts StringBuilder (and please share it here!) Almost all of the work is already done for you in Mono, but it will take time to suss out the parts that make it work into their own library. Mono's Regex leverages Novell's 2002 JVM implementation of Regex, oddly enough.

In Mono:

System.Text.RegularExpressions.Regex uses an RxCompiler to instantiate an IMachineFactory in the form of an RxInterpreterFactory, which unsurprisingly makes IMachines as RxInterpreters. Getting those to emit is most of what you need to do, although if you're just looking to learn how it's all structured for efficiency, it's notable much of what you're looking for is in its base class, BaseMachine.

In particular, in BaseMachine is the StringBuilder-based stuff. In the method LTRReplace, it first instantiates a StringBuilder with the initial string, and everything from there on out is purely StringBuilder-based. It's actually very annoying that Regex doesn't have StringBuilder methods hanging out, if we assume the internal Microsoft .Net implementation is similar.

Circling back to suggestion 2, you can mimic LTRReplace's behavior by calling .Matches(), tracking where you are in the original string, and looping:

var matches = regex.Matches(original);
var sb = new StringBuilder(original.Length);
int pos = 0; // position in original string
foreach(var match in matches)
{
    sb.Append(original.Substring(pos, match.Index)); // Append the portion of the original we skipped
    pos = match.Index;

    // Make any operations you like on the match result, like your own custom Replace, or even run another Regex

    pos += match.Value.Length;
}
sb.Append(original.Substring(pos, original.Length - 1));

But, this only saves you some strings - the mod-Mono approach is the only one that really does it right.


I'm not sure if this helps your scenario or not, but I ran into some memory consumption ceilings with Regex and I needed a simple wildcard replacement extension method on a StringBuilder to push past it. If you need complex Regex matching and/or backreferences, this won't do, but if simple * or ? wildcard replacements (with literal "replace" text) would get the job done for you, then the workaround at the end of my question here should at least give you a boost:

Has anyone implemented a Regex and/or Xml parser around StringBuilders or Streams?


Here's an extension method you could use to accomplish what you want. It takes in a Dictionary where the key is the pattern you're looking for and the value is what you want to replace it with. You still create copies of the incoming string but you only have to deal with this once instead of creating copies for multiple calls to Regex.Replace.

public static StringBuilder BulkReplace(this StringBuilder source, IDictionary<string, string> replacementMap)
{
    if (source.Length == 0 || replacementMap.Count == 0)
    {
        return source;
    }
    string replaced = Regex.Replace(source.ToString(), String.Join("|", replacementMap.Keys.Select(Regex.Escape).ToArray()), m => replacementMap[m.Value], RegexOptions.IgnoreCase);
    return source.Clear().Append(replaced);
}

Need Your Help

How can I use a custom font in Java?

java fonts

I wrote a program in Java that uses a special font that by default doesn't exist on any operating system.

How do I run specific tests using dotnet test?

xunit.net .net-core .net-core-rc2 xunit2

I have a large test suite in a .NET Core project. I can use the Test Explorer window to select a few tests to run.