1 year ago
#381926
Chazg76
XML::Twig Possible bug with twig_roots when using 'map_xmlns' option while looking for children of top level tag
I have encountered the following strange behaviour while testing the perl module XML::Twig with the arguments twig_handlers and twig_roots along with the 'map_xmlns' option (perl version: 5.30.2, XML::Twig version: 3.52).
BACKGROUND:
I have been using the perl module XML::Twig for a few years now. In particular I have used the twig_roots and twig_handlers arguments to extract data from large XML files were loading the entire file into memory is not practical.
Over the past few months I have discovered the 'map_xmlns' option when using the aforementioned twig_roots and twig_handlers arguments. This option is very useful when the same types of XML files to be analysed contain different tag prefixes.
The strange behaviour that I have found is outlined in the following piece of code.
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
my $dataA = '
<items>
<item>
<data1>data1A</data1>
<data2>data2A</data2>
</item>
<item>
<data1>data1B</data1>
<data2>data2B</data2>
</item>
</items>';
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
<x:item>
<x:data1>data1C</x:data1>
<x:data2>data2C</x:data2>
</x:item>
<x:item>
<x:data1>data1D</x:data1>
<x:data2>data2D</x:data2>
</x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1h;
my $t1h = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1r;
my $t1r = XML::Twig->new(
pretty_print => 'indented',
twig_roots => {
'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2h;
my $t2h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2r;
my $t2r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my @Array3h;
my $t3h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my @Array3r;
my $t3r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
my( $t, $elt,$Array1)= @_;
my @children_list=$elt->children();
for my $iChild (0 .. scalar @children_list-1){
push @$Array1,$children_list[$iChild]->name();
}
$t->purge;
}
The results of the above code are:
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'x:item',
'x:item'
];
$VAR1 = [];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
This shows that 2b.) incorrectly returned no child information data!
Replaceing 'twig_roots' [2b.)] by 'twig_handlers' [2a.)] generates the desired results.
The last two examples [3a.) and 3b.)] show that both the 'twig_roots' and 'twig_handlers' arguments work as expected with the ''map_xmlns' option when the tag to look for is not the root (top level) tag.
I cannot work out what the problem is. Is there a bug with twig_roots when using the 'map_xmlns' option while looking for child tags of top level tags?
xml
perl
xml-namespaces
xml-twig
0 Answers
Your Answer