Perl, Remove XML node

Data.xml

<people>
  <person name="John">
     <param name="age" value="21" />
  </person>
  <person name="Jane">
     <param name="age" value="25" />
  </person>
</people>

I have this piece of XML. I'm working on a script to append <person> nodes to the <people> node. I'm using XML::Simple (Please refrain from suggesting that i use another library, i'm aware of its difficulties).

my $remove_person = "Jane";

my $xml = XMLin('data.xml', ForceArray => 1, KeepRoot => 1, KeyAttr => []);
if(exists $xml->{people}[0]{person}){
        my $var = $xml->{people}[0]{person};
        my $count = @$var;
        my $person_index = 0;
        for(my $i = 0; $i < $count; $i++){
                if($xml->{people}[0]{person}[$i]->{name} eq $remove_person){
                        print "Person found at " . $person_index . " index";
                        $person_index = $i;
                        $person_to_remove = $xml->{people}[0]{person}[$i];
                }
        }
} else {
        print "Person not found in data.xml\r";
}

The above piece of code will give me the index of the node i wish to remove. Its from this point that i'm having my trouble. I cannot figure a correct way to remove this index from the data. So far i've tried a method of using splice, which returned the section of XML i want to remove, then i used XMLout()to convert the array back to XML. Using =~ s///g, i was able to edit node changes (<person> became <opt>). Once i'd XMLout()'ed the original data.xml structure, i attempted to string replace the variable of the removable section of XML, with empty string of the original structure. Obviously, this didn't work.

my $new_xml    = XMLout($xml, KeepRoot => 1);
my $remove_xml = XMLout($person_to_remove, KeepRoot => 1);

$remove_xml =~ s/opt/person/g;
$new_xml =~ s/($remove_xml)//g; # facepalm, i know

How would i remove this section of XML, either by array data removal, or plain file text removal so as to write back to the original data.xml file the new structure?

Answers


As you have been already told, the point of XML::Simple is to use Perl data structures instead of string manipulation. So, forget s/// and try

my $xml = XMLin($data, ForceArray => 1, KeepRoot => 1);
my $remove = 'Jane';
delete $xml->{people}[0]{person}{$remove};
print XMLout($xml, KeepRoot => 1);

or, with empty KeyAttr

my $xml = XMLin($data, ForceArray => 1, KeepRoot => 1, KeyAttr => []);
@{ $xml->{people}[0]{person} } = grep $_->{name} ne $remove,
                                 @{ $xml->{people}[0]{person} };
print XMLout($xml, KeepRoot => 1);

For comparison, the same task in XML::XSH2:

 open data.xml ;
 my $remove = 'Jane' ;
 delete /people/person[@name=$remove] ;
 save :b ;

Edit: The below was posted before the 'Please don't suggest I use other libraries' was added to the question. I'm leaving it, because I still think the correct answer is "don't use XML::Simple". You can use a hammer to put screws in the wall all you like, but it doesn't change the fact that however hard you hit it, the results are going to be messy.

Don't use XML::Simple and this is really easy. Even XML::Simple says:

The use of this module in new code is discouraged. Other modules are available which provide more straightforward and consistent interfaces.

The fundamental problem is that only trivial (simple!) XML can be represented directly via hashes and arrays. If you think about it - XML allows for duplicate nodes beneath the same parent, but with differing attributes and content. It also allows unary tags.

How about using XML::Twig instead:

#!/urs/bin/env perl
use strict;
use warnings;

use XML::Twig;

my $twig = XML::Twig -> new ('pretty_print' => 'indented_a' ) -> parsefile ( 'your_xml' ); 
foreach my $element ( $twig -> get_xpath('person[@name="Jane"]') ) {
   $element -> delete;
}

$twig -> print; 

You can - if want - also do this via an inplace edit using parsefile_inplace. Otherwise open a new file and output your new XML via $twig -> sprint.

e.g.:

XML::Twig->new(
    'pretty_print'  => 'indented_a',
    'twig_handlers' => {
        'person[@name="Jane"]' => sub { $_->delete }
    }
)->parsefile_inplace('xml_filename.xml');

If you're intent on using a hammer for your screws - this should do it with your initial code and XML::Simple:

$xml->{people}[0]{person} = 
     [ grep { not $_->{name} eq $remove_person }
                      @{ $xml->{people}[0]{person} } ];

Replaces the array in question with a filtered array on the name attribute.

Outputs:

<people>
  <person name="John">
    <param name="age" value="21" />
  </person>
</people>

Sadly I ended up in roughtly the same issue, I had to edit some XML on AIX without additionnal libraries. I ended up removing stuff like this

perl -0777 -p -i -e "s;(<HARDWARE>.*)<DESCRIPTION>.*<\/DESCRIPTION>(.*<\/HARDWARE>);\$1\$2;s" my.xml

This is ugly. I don't like it. But it worked then, and provided you know how to write a regexpr that should do now and then.


Need Your Help

Only use Automatch in Gamecenter

ios objective-c game-center gkmatchmaker

How do you remove the invite friend button from a GKMatchmakerViewController?

Graduated size symbols in legends

python matplotlib legend

I have plotted a bubble chart with the circles' sizes corresponding to a list of values using matplotlib. However, I'm having trouble creating a legend for the plot that has variable size symbols t...