Splitting a String into Tokens and Storing the Delimiters in Perl

I have a string like this:

a  b   c       d

I process my string like this:

   chomp $line;
    my @tokens = split /\s+/, $line;
    my @new_tokens;
    foreach my $token (@tokens) {    
        push @new_tokens, some_complex_function( $token );
    my $new_str = join ' ', @tokens;

I'd like to re-join the string with the original whitespace. Is there some way that I can store the whitespace from split and re-use it later? Or is this going to be a huge pain? It's mostly cosmetic, but I'd like to preserve the original spaces from the input string.


If you split with a regex with capturing parentheses, the split pattern will be included in the resulting list (see perldoc -f split):

my @list = split /(\s+)/, 'a  b   c       d';
print Data::Dumper::Dumper(\@list);

VAR1 = [
          '  ',
          '   ',
          '       ',

Just split on word boundaries:

split /\b/, $line;

For your example, this will give:

('a','  ','b','   ','c','       ','d')

EDIT: As brian d foy pointed out, \b uses the wrong character classes, Following my original idea, I came up with using look-around assertions. This looks way more complicated than Ether's answer, though:

split /(?:(?<=\S)(?=\s)|(?<=\s)(?=\S))/, $line;

Why don't you simply do: my $new_str = uc( $line ); ?

UPDATE - original uc() is just a shorthand for "more complex function".

Well, generally you can also:

$line =~ s/(\S+)/more_complex_function($1)/ge;

Need Your Help

OrderBy().ThenBy() wrong output

c# linq list sorting point

I have a list of Points with 50 elements, I wanted to sort them so I used orderby thenby to, but my output seems to be wrong. The first elements are sorted in accordingly, but the next ones are wro...

Java spring security - intercept subdomain url for different login?

java spring security subdomain

I have an application with spring security installed and working well -- it is currently running out of www.exampledomain.com.