Regular expression to get words that begin with a capital letter

I have a text file that is like 2 pages long and I need to write a regular expression that will extract words that begin with a capital letter. Some examples of words that I would like to get (but not limited to these) are:

British Indian Ocean Territory
People's Republic of China Confederation of Independent States
French Southern and Antarctic Lands
Gilbert and Ellia Islands
Iraq-Saudia Arabia Neutral Zone
Juan de Nova Island 
St. Vincent and the Grenadines 
Trust Territory of the Pacific 
Washington, D.C.

The regex that I came up with is:

"((?:[A-Z][a-z]+\\s){2,4}?) || ((?:[A-Z][a-z]+\\s){1,2}of(?:\\s[A-Z][a-z]+){1,2}) || ((?:[A-Z][a-z]+\\s){1,2}and(?:\\s[A-Z][a-z]+){1,2})"

Answers


Use this regex

\b[A-Z].*?\b

http://rubular.com/r/HG7YJLgkc3

REGEXPLANATION:

  • \b is a word boundary. It matches the beginning and ending of a word
  • . matches any character,
  • * matches the previous character 0 or more times,
  • ? makes the previous * non-greedy, so it matches as few characters as it can instead of the the whole string

This regex should work:

"\\b(([A-Z]\\S*)|and|or|the)\\b"

Will this achieve what you are looking for? [A-Z]\S*\s

The regex that you use may change slightly with the implementation and you may have to use some flags to allow multiline searches and multiple matches.


This gets yo pretty close:

(\b[A-Z].*?\b('s|-|\.|,)?(\s((the|and|of|de)\s)*)?)+

Check this: http://rubular.com/r/5LpVm0oKtu


Need Your Help

how do i embed a YouTube video in Joomla 1.5.23?

video joomla youtube embed

How do I embed a YouTube video into Joomla 1.5.23? I have tried the default YouTube embed code several times but Joomla won't accept it.

Java heap size - will this work?

java heap

I try this with NetBeans desktop application template - increasing heapsize (to 512 MiB) of executed .jar file. (I believe that NetBeans uses Singleton app by default - SingleFrameView)