1 year ago

#381926

test-img

Chazg76

XML::Twig Possible bug with twig_roots when using 'map_xmlns' option while looking for children of top level tag

I have encountered the following strange behaviour while testing the perl module XML::Twig with the arguments twig_handlers and twig_roots along with the 'map_xmlns' option (perl version: 5.30.2, XML::Twig version: 3.52).

BACKGROUND:

I have been using the perl module XML::Twig for a few years now. In particular I have used the twig_roots and twig_handlers arguments to extract data from large XML files were loading the entire file into memory is not practical.

Over the past few months I have discovered the 'map_xmlns' option when using the aforementioned twig_roots and twig_handlers arguments. This option is very useful when the same types of XML files to be analysed contain different tag prefixes.

The strange behaviour that I have found is outlined in the following piece of code.

use strict;
use warnings;
use XML::Twig;
use Data::Dumper;

my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
        
my $dataA = '
<items>
    <item>
        <data1>data1A</data1>
        <data2>data2A</data2>
    </item>
    <item>
        <data1>data1B</data1>
        <data2>data2B</data2>
    </item>
</items>';
        
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
    <x:item>
        <x:data1>data1C</x:data1>
        <x:data2>data2C</x:data2>
    </x:item>
    <x:item>
        <x:data1>data1D</x:data1>
        <x:data2>data2D</x:data2>
    </x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1h;
my $t1h = XML::Twig->new(
    pretty_print => 'indented',
    twig_handlers => {  
        'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1r;
my $t1r = XML::Twig->new(
    pretty_print => 'indented',
    twig_roots => {
        'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2h;
my $t2h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2r;
my $t2r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my  @Array3h;
my $t3h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my  @Array3r;
my $t3r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
    my( $t, $elt,$Array1)= @_;
    my @children_list=$elt->children();
    for my $iChild (0 .. scalar @children_list-1){
        push @$Array1,$children_list[$iChild]->name();
    }
    $t->purge;
}

The results of the above code are:

$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'x:item',
      'x:item'
    ];
$VAR1 = [];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];

This shows that 2b.) incorrectly returned no child information data!

Replaceing 'twig_roots' [2b.)] by 'twig_handlers' [2a.)] generates the desired results.

The last two examples [3a.) and 3b.)] show that both the 'twig_roots' and 'twig_handlers' arguments work as expected with the ''map_xmlns' option when the tag to look for is not the root (top level) tag.

I cannot work out what the problem is. Is there a bug with twig_roots when using the 'map_xmlns' option while looking for child tags of top level tags?

xml

perl

xml-namespaces

xml-twig

0 Answers

Your Answer

Accepted video resources