In Perl, how to remove ^M from a file?

I have a script that is appending new fields to an existing CSV, however ^M characters are appearing at the end of the old lines so the new fields end up on a new row instead of the same one. How do I remove ^M characters from a CSV file using Perl?

Answers


^M is carriage return. You can do this:

$str =~ s/\r//g

Or a 1-liner:

perl -p -i -e 's/\r\n$/\n/g' file1.txt file2.txt ... filen.txt

You found out you can also do this:

$line=~ tr/\015//d;

Slightly unrelated, but to remove ^M from the command line using Perl, do this:

perl -p -i -e "s/\r\n/\n/g" file.name

I prefer a more general solution that will work with either DOS or Unix input. Assuming the input is from STDIN:

while (defined(my $ln = <>))
  {
    chomp($ln);
    chop($ln) if ($ln =~ m/\r$/);

    # filter and write
  }

This one liner replaces all the ^M characters:

dos2unix <file-name>

You can call this from inside Perl or directly on your Unix prompt.


To convert DOS style to UNIX style line endings:

for ($line in <FILEHANDLE>) {
   $line =~ s/\r\n$/\n/;
}

Or, to remove UNIX and/or DOS style line endings:

for ($line in <FILEHANDLE>) {
   $line =~ s/\r?\n$//;
}

This is what solved my problem. ^M is a carriage return, and it can be easily avoided in a Perl script.

while(<INPUTFILE>)
{
     chomp;
     chop($_) if ($_ =~ m/\r$/);
}

Little script I have for that. A modification of it helped to filter out some other non-printable characters in cross-platform legacy files.

#!/usr/bin/perl
# run this as
# convert_dos2unix.pl < input_file > output_file
undef $/;
$_ = <>;
s/\r//ge;
print;

In vi hit :.

Then s/Control-VControl-M//g.

Control-V Control-M are obviously those keys. Don't spell it out.


Need Your Help

How to extract common / significant phrases from a series of text entries

nlp text-extraction nltk text-analysis

I have a series of text items- raw HTML from a MySQL database. I want to find the most common phrases in these entries (not the single most common phrase, and ideally, not enforcing word-for-word

Why remove unused using directives in C#?

c# .net using

I'm wondering if there are any reasons (apart from tidying up source code) why developers use the "Remove Unused Usings" feature in Visual Studio 2008?