Why does my Perl script remove characters from the file?

I have some issue with a Perl script. It modifies the content of a file, then reopen it to write it, and in the process some characters are lost. All words starting with '%' are deleted from the file. That's pretty annoying because the the % expressions are variable placeholders for dialog boxes.

Do you have any idea why? Source file is an XML with default encoding

Here is the code:

undef $/;
open F, $file or die "cannot open file $file\n";
my $content = <F>;                                           
close F;                                                     

$content =~s{status=["'][\w ]*["']\s*}{}gi;

printf $content;

open F, ">$file" or die "cannot reopen $file\n";             
printf F $content;                                           
close F or die "cannot close file $file\n";

Answers


You're using printf there and it thinks its first argument is a format string. See the printf documentation for details. When I run into this sort of problem, I always ensure that I'm using the functions correctly. :)

You probably want just print:

 print FILE $content;

In your example, you don't need to read in the entire file since your substitution does not cross lines. Instead of trying to read and write to the same filename all at once, use a temporary file:

open my($in),  "<", $file       or die "cannot open file $file\n";
open my($out), ">", "$file.bak" or die "cannot open file $file.bak\n";

while( <$in> )
    {
    s{status=["'][\w ]*["']\s*}{}gi;
    print $out;
    }

rename "$file.bak", $file or die "Could not rename file\n";

This also reduces to this command-line program:

% perl -pi.bak -e 's{status=["\']\\w ]*["\']\\s*}{}g' file

Er. You're using printf.

printf interprets "%" as something special.

use "print" instead.

If you have to use printf, use

printf "%s", $content;

Important Note:

PrintF stands for Print Format , just as it does in C.

fprintf is the equivelant in C for File IO.

Perl is not C.

And even IN C, putting your content as parameter 1 gets you shot for security reasons.


Or even

perl -i bak -pe 's{status=["\'][\w ]*["\']\s*}{}gi;' yourfiles

-e says "there's code following for you to run"

-i bak says "rename the old file to whatever.bak"

-p adds a read-print loop around the -e code

Perl one-liners are a powerful tool and can save you a lot of drudgery.


If you want a solution that is aware of the XML nature of the docs (i.e., only delete status attributes, and not matching text contents) you could also use XML::PYX:

$ pyx doc.xml | perl -ne'print unless /^Astatus/' | pyxw

That's because you used printf instead of print and you know printf doesn't print "%" (because it would think you forgot to type the format symbol such as %s, %f etc) unless you explicitly mention by "%%". :-)


Need Your Help

instaling node on CentOS

node.js npm centos6

I am trying to install node on `

Tool which can do both UML and ER Diagram

uml diagram entity-relationship

I have approval to buy one tool which can help programmers of my team to write better code.