best way to turn a post title into an URL in c#

I was wondering which is the best way to turn a string (e.g. a post title) into a descriptive URL. the simplest way that comes to mind is by using a regex, such in:

public static Regex regex = new Regex(
      "\\W+",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );

string result = regex.Replace(InputText,"_");

which turns

"my first (yet not so bad) cupcake!! :) .//\."

into

my_first_yet_not_so_bad_cupcake_

then I can strip the last "_" and check it against my db and see if it's yet present. in that case I would add a trailing number to make it unique and recheck.

I could use it in, say

http://myblogsite.xom/posts/my_first_yet_not_so_bad_cupcake

but, is this way safe? should i check other things (like the length of the string) is there any other, better method you prefer? thanks

Answers


Here's what I do. regStripNonAlpha removes all the non-alpha or "-" characters. Trim() removes trailing and leading spaces (so we don't end up with dashes on either side). regSpaceToDash converts spaces (or runs of spaces) into a single dash. This has worked well for me.

static Regex regStripNonAlpha = new Regex(@"[^\w\s\-]+", RegexOptions.Compiled);
static Regex regSpaceToDash = new Regex(@"[\s]+", RegexOptions.Compiled);

public static string MakeUrlCompatible(string title)
{
    return regSpaceToDash.Replace(
      regStripNonAlpha.Replace(title, string.Empty).Trim(), "-");
}

string result = regex.Replace(InputText,"-");

instead of under score put hypen (-) that would give added advantage for Google search engine.

See below post for more details

http://www.mattcutts.com/blog/dashes-vs-underscores/


Here's a method I wrote not too long ago that takes a string and formats it to a permalink.

        private string FormatPermalink(string title)
        {
            StringBuilder result = new StringBuilder();
            title = title.Trim();
            bool lastOneChanged = false;
            for (int i = 0; i < title.Length; i++)
            {
                char c = title[i];
                if (!char.IsLetterOrDigit(c))
                {
                    c = '_';
                    if (lastOneChanged)
                    {
                        continue;
                    }
                    lastOneChanged = true;
                }

                else
                {
                    lastOneChanged = false;
                }

                result.Append(c);
            }

            if (result[result.Length - 1] == '_') //if last one is underscore, remove
            {
                result = result.Remove(result.Length - 1, 1);
            }
            return result.ToString();
        }

This takes into account special characters as well, so if the title has a special character, it just ignores it and moves on to the next one.


You could look into a URL re-writing HTTPModule. There are many examples on the net.

Once implemented in your web.config you simply specify the regular expression to map to the "real" page using the SEO friendly name

<!-- Rule 1: example... "/admin/somepage" redirects to..."/UI/Forms/Admin/frmPage.aspx" -->

  <add key="^/admin/(.*)" value="/UI/Forms/Admin/frm$1.aspx" />

If you want to avoid doing this yourself, an HttpModule like http://urlrewriter.net/ could help. It's pretty good but requires a bit setting up.


Personally, I'd couple your special character removing with a date so your example would look like:

http://myblogsite.xom/posts/2009/04/03/my_first_yet_not_so_bad_cupcake

That way, if you content with the same title, it gets differentiated by date too. I see this often on some blogs I visit where they use "Five Random Things Make A Post" a lot (but not within the same day).


Need Your Help

Password protect EVERY directory with Apache

apache

Is there a way to password protect every directory requested in Apache? So that regardless what location the request is for, it must be authenticated. Also, can this be done WITHOUT .htaccess, bu...

Using input box with element by.id Protractor Testing error

testing textbox automation protractor getelementbyid

I'm trying to use ids with my input box's within my login page but I get the following error with Protractor: