In Perl, how can I correctly parse tab/space delimited files with quoted strings?

I need to parse tab/space delimited files that have a lot of columns in Perl. The values are such that the there are large strings enclosed within double quotes. These strings can have any characters such as tabs and spaces or anything else.

When I try to parse them with the split function it splits these strings as well. Now how can I make perl understand that the strings within the " " are a single column entry?

A simple example is,

12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

Answers


Use the Text::CSV library, which handles all the edge cases for you. It lets you set the delimiter:

my $csv = Text::CSV->new({sep_char => "\t"});

Note that you say tab/space delimited. If delimiters are mixed and/or you have to treat consecutive spaces as one, using Text::ParseWords might be easier:

#!/usr/bin/perl

use Text::ParseWords qw( quotewords );
use YAML;

while ( my $line = <DATA> ) {
    print Dump [ quotewords('\s+', 0, $line) ];
}

__DATA__
12  345546.67677   "Hello World!!!" -567.55656 0.5465767 "Hello_Again;   "

Output:

---
- 12
- 345546.67677
- Hello World!!!
- -567.55656
- 0.5465767
- 'Hello_Again;   '

Other possibilities are Regexp::Common::balanced and Text::Balanced.


Need Your Help

Javascript Epoch Time In Days

javascript epoch days

I need the epoch time in days. I've seen posts on how to translate it to date but none in days. I'm pretty bad with epoch time...how could I get this?

Aptana Studio vs. Eclipse

java php eclipse ide aptana

I am a beginner. I am using Aptana Studio for PHP. Today, I also downloaded Eclipse. I notice most of the features and workings are similar. It seems one is built on the code-base of the other.