add document tags to xml doc

Discussion in 'Perl Misc' started by DJ Stunks, Sep 15, 2010.

  1. DJ Stunks

    DJ Stunks Guest

    hey all,

    I'm using XML::parser to parse an xml file which is not well-formed.
    The source I receive it from formats it as:

    $ cat data.xml
    <row><data foo="a"/></row>
    <row><data baz="b"/></row>

    Obviously the parser chokes on this. If I manually add document tags
    as follows, my script is fine:

    $ cat fixed-data.xml
    <d>
    <row><data foo="a"/></row>
    <row><data baz="b"/></row>
    </d>

    Question: script is below. What is the easiest way to add the
    document tags such that the parser doesn't choke without the
    additional step of manually adding them? I can pass a filehandle to
    the parser if there was a way to add document tags to the filehandle,
    but the file is very large and I couldn't think of an easy way to add
    the data to the filehandle without slurping into a scalar.

    TIA,
    -jp

    $ cat tmp.pl
    #!/usr/bin/perl

    use strict;
    use warnings;

    use XML::parser;

    my $parser = XML::parser->new(Handlers => { Start =>
    \&handle_start });
    $parser->parsefile('fixed-data.xml');

    sub handle_start {
    my ($p, $el, %atts) = @_;

    if ($el eq 'data') {
    for my $k (keys %atts) {
    print "$k: $atts{$k}\n";
    }
    }
    }

    __END__
     
    DJ Stunks, Sep 15, 2010
    #1
    1. Advertising

  2. DJ Stunks

    DJ Stunks Guest

    On Sep 14, 6:26 pm, Ben Morrow <> wrote:
    > Quoth DJ Stunks <>:
    >
    >
    >
    > > hey all,

    >
    > > I'm using XML::parser to parse an xml file which is not well-formed.
    > > The source I receive it from formats it as:

    >
    > > $ cat data.xml
    > > <row><data foo="a"/></row>
    > > <row><data baz="b"/></row>

    >
    > > Obviously the parser chokes on this.  If I manually add document tags
    > > as follows, my script is fine:

    >
    > > $ cat fixed-data.xml
    > > <d>
    > > <row><data foo="a"/></row>
    > > <row><data baz="b"/></row>
    > > </d>

    >
    > > Question: script is below.  What is the easiest way to add the
    > > document tags such that the parser doesn't choke without the
    > > additional step of manually adding them?  I can pass a filehandle to
    > > the parser if there was a way to add document tags to the filehandle,
    > > but the file is very large and I couldn't think of an easy way to add
    > > the data to the filehandle without slurping into a scalar.

    >
    > See XML::parser->parse_start. You will obviously have to handle reading
    > chunks from the file manually.


    Thanks very much, Ben. With the modified code below I'm good to go.
    It took a bit of hunting on that method, thanks for pointing it out.

    -jp

    #!/usr/bin/perl

    use strict;
    use warnings;

    use XML::parser;

    my $file = 'data.xml';

    my $parser = XML::parser->new(Handlers => { Start =>
    \&handle_start });

    my $p = $parser->parse_start();
    $p->parse_more('<d>');

    open (my $fh, '<', $file) or die "Could not open '$file': $!";

    LINE:
    while (my $line = <$fh>) {
    $p->parse_more($line);
    }

    sub handle_start {
    my ($p, $el, %atts) = @_;

    if ($el eq 'data') {
    for my $k (keys %atts) {
    print "$k: $atts{$k}\n";
    }
    }

    }

    __END__
     
    DJ Stunks, Sep 15, 2010
    #2
    1. Advertising

  3. DJ Stunks wrote:
    > hey all,
    >
    > I'm using XML::parser to parse an xml file which is not well-formed.
    > The source I receive it from formats it as:
    >
    > $ cat data.xml
    > <row><data foo="a"/></row>
    > <row><data baz="b"/></row>
    >
    > Obviously the parser chokes on this. If I manually add document tags
    > as follows, my script is fine:
    >
    > $ cat fixed-data.xml
    > <d>
    > <row><data foo="a"/></row>
    > <row><data baz="b"/></row>
    > </d>
    >
    > Question: script is below. What is the easiest way to add the
    > document tags such that the parser doesn't choke without the
    > additional step of manually adding them? I can pass a filehandle to
    > the parser if there was a way to add document tags to the filehandle,
    > but the file is very large and I couldn't think of an easy way to add
    > the data to the filehandle without slurping into a scalar.


    Ben's answer is better, of course, but here is a dirty way to get
    such file handle if you really need a file handle:

    open my $fh, q{echo "<d>"; cat $file; echo "</d>"|} or die $!;

    It's not portable, not tolerant of special characters in $file, and may
    not fail as expected if $file is not readable.

    Xho
     
    Xho Jingleheimerschmidt, Sep 15, 2010
    #3
  4. DJ Stunks

    Guest

    On Tue, 14 Sep 2010 18:15:48 -0700 (PDT), DJ Stunks <> wrote:

    >On Sep 14, 6:26 pm, Ben Morrow <> wrote:
    >> Quoth DJ Stunks <>:
    >>
    >>
    >>
    >> > hey all,

    >>
    >> > I'm using XML::parser to parse an xml file which is not well-formed.
    >> > The source I receive it from formats it as:

    >>
    >> > $ cat data.xml
    >> > <row><data foo="a"/></row>
    >> > <row><data baz="b"/></row>

    >>
    >> > Obviously the parser chokes on this.  If I manually add document tags
    >> > as follows, my script is fine:

    >>
    >> > $ cat fixed-data.xml
    >> > <d>
    >> > <row><data foo="a"/></row>
    >> > <row><data baz="b"/></row>
    >> > </d>

    >>
    >> > Question: script is below.  What is the easiest way to add the
    >> > document tags such that the parser doesn't choke without the
    >> > additional step of manually adding them?  I can pass a filehandle to
    >> > the parser if there was a way to add document tags to the filehandle,
    >> > but the file is very large and I couldn't think of an easy way to add
    >> > the data to the filehandle without slurping into a scalar.

    >>
    >> See XML::parser->parse_start. You will obviously have to handle reading
    >> chunks from the file manually.

    >
    >Thanks very much, Ben. With the modified code below I'm good to go.
    >It took a bit of hunting on that method, thanks for pointing it out.
    >
    >-jp
    >
    >#!/usr/bin/perl
    >
    >use strict;
    >use warnings;
    >
    >use XML::parser;
    >
    >my $file = 'data.xml';
    >
    >my $parser = XML::parser->new(Handlers => { Start =>
    >\&handle_start });
    >
    >my $p = $parser->parse_start();
    >$p->parse_more('<d>');
    >
    >open (my $fh, '<', $file) or die "Could not open '$file': $!";
    >
    >LINE:
    >while (my $line = <$fh>) {
    > $p->parse_more($line);
    >}
    >
    >sub handle_start {
    > my ($p, $el, %atts) = @_;
    >
    > if ($el eq 'data') {
    > for my $k (keys %atts) {
    > print "$k: $atts{$k}\n";
    > }
    > }
    >
    >}
    >
    >__END__
    >
    >


    Its likely that you will want to finish the ExpatNB parse
    with a call to XML::parser::ExpatNB::parse_done because it
    releases any circular data structure references.

    But calling parse_done() validates your closures.
    The sequence then requires a final call to parse_more('</d>').
    Something like below.

    -sln
    ----------
    use strict;
    use warnings;

    use XML::parser;

    my $parser = XML::parser::ExpatNB->new();
    $parser->setHandlers(Start => \&handle_start);


    print $parser,"\n";

    { local $/;
    $parser->parse_more('<d>');
    $parser->parse_more(<DATA>);
    $parser->parse_more('</d>');
    $parser->parse_done;
    }

    sub handle_start {
    my ($p, $el, %atts) = @_;
    if ($el eq 'data') {
    for my $k (keys %atts) {
    print "$k: $atts{$k}\n";
    }
    }
    }

    __DATA__

    <row><data foo="a"/></row>
    <row><data baz="b"/></row>

    -----------
    XML::parser::ExpatNB=HASH(0x18757d4)
    foo: a
    baz: b
     
    , Sep 15, 2010
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Matt
    Replies:
    3
    Views:
    549
    Tor Iver Wilhelmsen
    Sep 17, 2004
  2. Tony Prichard
    Replies:
    0
    Views:
    788
    Tony Prichard
    Dec 12, 2003
  3. Manish Hatwalne
    Replies:
    1
    Views:
    423
    Martin Honnen
    Jul 13, 2004
  4. Francesco Moi
    Replies:
    8
    Views:
    588
    Martin Honnen
    Feb 21, 2005
  5. Ujwal
    Replies:
    0
    Views:
    139
    Ujwal
    Dec 4, 2003
Loading...

Share This Page