Matching URL with wildcards
I'm trying to match URLs with wildcards in them to actual URLs. For example:
Needs to match
What would be the best way of going about this?
I've tried using a regular expression and that works fine when I manually program it but I'm not sure whether it's possible to dynamically generate regular expressions or if that would be the best practice in this situation.
Thanks very much.
Replace all occurrences of * in the pattern with [^ ]* - it matches a sequence of zero or more non-space characters.
Thus http://*google.com/* will become http://[^ ]*google.com/[^ ]*
Here is a regular expression to do the task:
regex = urlPattern.replace(/\*/g, "[^ ]*");
Generating a regex is probably the right way, but is gets more complicated than simply replacing the asterisks.
For example, your pattern http://*google.com/* should not match http://www.malicioushacker.org/1337/google.com/maps.
If you want to see a well tested library for extracting parts of a URI, I would check out Google Closure Library's goog.uri.utils methods.
Here's the regex that does the heavy lifting:
goog.uri.utils.splitRe_ = new RegExp( '^' + '(?:' + '([^:/?#.]+)' + // scheme - ignore special characters // used by other URL parts such as :, // ?, /, #, and . ':)?' + '(?://' + '(?:([^/?#]*)@)?' + // userInfo '([\\w\\d\\-\\u0100-\\uffff.%]*)' + // domain - restrict to letters, // digits, dashes, dots, percent // escapes, and unicode characters. '(?::([0-9]+))?' + // port ')?' + '([^?#]+)?' + // path '(?:\\?([^#]*))?' + // query '(?:#(.*))?' + // fragment '$');