Thats not exactly what I want. My file is more like:
<file>
<item>
<itemtags>
<tag1>foo</tag1>
<tag2>bar</tag2>
[snip]
So I need a series of xmlchunks like the following as my output (they
will be passed to another process for processing one at a time):
<itemtags>
<tag1>foo</tag1>
<tag2>bar</tag2>
</itemtags>
Well, then it would be even easier, but...
Also, the files I am dealing with are going to be large, and each
itemtags section is about 32K in size. [snip]
I have no real experience parsing big xml files in Perl (or
anything). My file has 10 items at a total size of ~400K and it takes
~ 5.2 CPU seconds to parse it and print each chunk. That seems slow
to me - can I expect to parse the file faster than that?
Ok, for those that are interested I have now two ways of doing this,
using XML::Twig or XML:

arser
XML::Twig code:
[sodonnel@millhouse]$ more twig.pl
use XML::Twig;
use Benchmark;
my $item;
sub print_it {
my ($t, $elt) = @_;
$elt->set_asis;
# putting this into $item and then clearing it is stupid
# but its to make it a fair test to what I am doing with
# XML:

arser.
$item = $elt->sprint($elt,1), "\n";
$item = '';
$t->purge;
}
my $t= XML::Twig->new( twig_handlers =>
{ 'cloudItem' => \&print_it }
);
my $bstart = new Benchmark;
$t->parsefile( 'cloud.xml');
my $bend = new Benchmark;
print timestr(timediff($bend,$bstart)), "\n";
XML:

arser code:
[sodonnel@millhouse]$ more xml_parser.pl
use XML:

arser;
use Benchmark;
my( $in_item, $item_text);
my $bstart = new Benchmark;
my $parser = XML:

arser->new(Handlers => { Start => \&tag_start,
End => \&tag_end,
Char => \&characters,
});
$parser->parsefile('cloud.xml');
my $bend = new Benchmark;
print timestr(timediff($bend,$bstart)), "\n";
exit(0);
sub tag_start {
my ($xp, $el) = @_;
# this will copy all but the first occurrance into item text
if ($in_item >= 1) { $item_text .= $xp->recognized_string }
if ($el eq 'cloudItem') { $in_item += 1 }
}
sub tag_end {
my ($xp, $el) = @_;
if ($el eq 'cloudItem') { $in_item -= 1 }
if ($in_item == 0) {
#print $item_text;
$item_text = '';
} else {
# copies everything but the closing cloudItem tag
$item_text .= $xp->recognized_string;
}
}
sub characters {
my ($xp, $txt) = @_;
if ($in_item) { $item_text .= $txt }
}
[sodonnel@millhouse]$ perl xml_parser.pl
1 wallclock secs ( 1.59 usr + 0.00 sys = 1.59 CPU)
[sodonnel@millhouse]$ perl twig.pl
5 wallclock secs ( 5.14 usr + 0.02 sys = 5.16 CPU)
So XML:

arser wins by quite a way, probably because it doesn't make a
memory structure of the tags. Goodness only knows if this is the best
way, but its good enough for now.
Cheers,
Stephen.