regex to extract all that in quotes

I am trying to write a regex to match all strings which appear in between enclosing characters (most likely " - double quotes). This is a scenario I commonly encounter while trying to parse a line in a csv file.

So I have a sample line like:

"Smith, John",25,"21/45, North Avenue",IBM

Tried the following regex:

"(.*)"

But it fetches somewhat as follows:

I am expecting output as follows:

Smith, John
25
21/45, North Avenue
IBM

The regex I have written is an attempt to capture what comes between " in my example. However, above is the output I am expecting.

There is a kind of ambiguity though: I am not looking for a match like: ,25,. This kinda makes me wonder if a regex is even feasible here.

What is the correct way to write this?

Answers


If you really want to roll your own CSV parser, you'll need to teach your regex a few rules:

  1. A field may be unquoted as long as it doesn't contains quotes, commas or newlines.
  2. A quoted field may contain any characters; quotes are escaped by doubling.
  3. Commas are used as separators.

So, to match one CSV field, you can use the following regex:

(?mx)       # Verbose, multiline mode
(?<=^|,)    # Assert there is a comma or start of line before the current position.
(?:         # Start non-capturing group:
 "          # Either match an opening quote, followed by
 (?:        # a non-capturing group:
  ""        #  Either an escaped quote
 |          #  or
  [^"]+     #  any characters except quotes
 )*         # End of inner non-capturing group, repeat as needed.
 "          # Match a closing quote.
|           # OR
 [^,"\r\n]+ # Match any number of characters except commas, quotes or newlines
)           # End of outer non-capturing group
(?=,|$)     # Assert there is a comma or end-of-line after the current position

See it live on regex101.com.


Please don't use regex for this, CSV should be handled by a parser.

Here is a ready-to-use parser: http://www.codeproject.com/Articles/9258/A-Fast-CSV-Reader

You can also use the OLEDB built-in parser: http://www.switchonthecode.com/tutorials/csharp-tutorial-using-the-built-in-oledb-csv-parser

Hope this helps


Need Your Help

Split text of textarea on cursor position. (asp.net c#)

c# javascript jquery asp.net

what I want to do is to split the text of textarea on cursor position. For example if the text is "hello my world" when the user clicks at the end of hello and press some button then the text shoul...

Get fulldate in fullcalendar eventDrop

javascript fullcalendar

How can I get fulldate of the specific position from eventDrop?