xml type parser in the standard perl installation ?

Discussion in 'Perl Misc' started by Abhinav, May 27, 2004.

  1. Abhinav

    Abhinav Guest

    Hi

    I have a script where some chuncks of text are marked between xml-type
    tags .

    I say 'xml-type' instead of xml as the tags are preceded with a comment
    character "# " so that the script does not fail.

    I need to be able to extract the data between tags (which can be
    nested), and store it in a hash with each key being the tag itself and
    the value, the data in between (it is multiline).

    The problem is that I initiially tried using Text::Balanced, but gave up
    since ir was too demanding for this kind of work .. spanning across
    multiple lines ..

    I am thinking of stripping the # from all tagged lines so that it
    becomes an xml file, adding a root element (which was not present
    before) , and then using an xml parser.

    My questions :
    1. Is the approach feasible, or is there som other simpler way to do it
    ... (after all, TIMTOWTDI)
    2. If the above is the optimal solution, is there any parser/module
    shipped along with the standard perl (5.8) distro .. ?

    Many thanks ..
    Abhinav
     
    Abhinav, May 27, 2004
    #1
    1. Advertising

  2. Abhinav

    John Bokma Guest

    Abhinav wrote:

    > Hi
    >
    > I have a script where some chuncks of text are marked between xml-type
    > tags .
    >
    > I say 'xml-type' instead of xml as the tags are preceded with a comment
    > character "# " so that the script does not fail.


    Why not put the XML at the end, after __END__ and read it using <DATA>?

    > I need to be able to extract the data between tags (which can be
    > nested), and store it in a hash with each key being the tag itself and
    > the value, the data in between (it is multiline).


    Or open your script as a file, and read the #'s and throw away real
    comments (you can use ## for real ones for example), and parse the
    result. But I recommend __END__

    > The problem is that I initiially tried using Text::Balanced, but gave up
    > since ir was too demanding for this kind of work .. spanning across
    > multiple lines ..
    >
    > I am thinking of stripping the # from all tagged lines so that it
    > becomes an xml file, adding a root element (which was not present
    > before) , and then using an xml parser.


    Yup, good idea :-D.

    > My questions :
    > 1. Is the approach feasible, or is there som other simpler way to do it
    > .. (after all, TIMTOWTDI)


    use __END__

    > 2. If the above is the optimal solution, is there any parser/module
    > shipped along with the standard perl (5.8) distro .. ?


    Yes, but I like XML::Twig a lot ;-) Have a look at it.

    http://xmltwig.com/xmltwig/

    Other pointers:

    http://www.xml.com/pub/a/2000/04/05/feature/index.html
    http://perl-xml.sourceforge.net/faq/
    --

    John MexIT: http://johnbokma.com/mexit/
    personal page: http://johnbokma.com/
    Experienced Perl programmer available: http://castleamber.com/
     
    John Bokma, May 28, 2004
    #2
    1. Advertising

  3. Abhinav

    chanio Guest

    John Bokma (comp.lang.perl.misc) dijo...

    > Abhinav wrote:
    >
    >> Hi
    >>
    >> I have a script where some chuncks of text are marked between xml-type
    >> tags .
    >>
    >> I say 'xml-type' instead of xml as the tags are preceded with a comment
    >> character "# " so that the script does not fail.

    >
    > Why not put the XML at the end, after __END__ and read it using <DATA>?
    >
    >> I need to be able to extract the data between tags (which can be
    >> nested), and store it in a hash with each key being the tag itself and
    >> the value, the data in between (it is multiline).

    >
    > Or open your script as a file, and read the #'s and throw away real
    > comments (you can use ## for real ones for example), and parse the
    > result. But I recommend __END__
    >
    >> The problem is that I initiially tried using Text::Balanced, but gave up
    >> since ir was too demanding for this kind of work .. spanning across
    >> multiple lines ..
    >>
    >> I am thinking of stripping the # from all tagged lines so that it
    >> becomes an xml file, adding a root element (which was not present
    >> before) , and then using an xml parser.

    >
    > Yup, good idea :-D.
    >
    >> My questions :
    >> 1. Is the approach feasible, or is there som other simpler way to do it
    >> .. (after all, TIMTOWTDI)

    >
    > use __END__
    >
    >> 2. If the above is the optimal solution, is there any parser/module
    >> shipped along with the standard perl (5.8) distro .. ?

    >
    > Yes, but I like XML::Twig a lot ;-) Have a look at it.
    >
    > http://xmltwig.com/xmltwig/
    >
    > Other pointers:
    >
    > http://www.xml.com/pub/a/2000/04/05/feature/index.html
    > http://perl-xml.sourceforge.net/faq/

    You read each line without the preceding # and load it all in a scalar. Then
    add it to the XMLin part of xmltwig. And get the parsed xml from XMLout.
    Read the help file well since it has a lot of clauses in order to interpret
    it well, but it is only trying and changing until you get the best result.
    Then you have it all inside a hash reference (with array references inside).

    --
    .------------------. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
    | ___ _ _ _ _ | ALBERTO ADRIAN SCHIANO - ARGENTINA - 2004
    |/ __/ | \ || | | | <> # 34-34S 058-25W(z-3)
    |||_< \| || ' | | +------------+------------------------------
    |`____/|_\_|`___' | LINUX COUNTER: 240 133 ~ machine : 119 401
    | _ _ _ __ _ | +------------+----------+-------------------
    || | | \ |\ \/ | AMD Athlon 6 |RAM 512Mb.|krnl.: 2.6.3-10mdk
    || |_ | | \ \ | i586-mandrake-linux-gnu |MDK 9.2 - KDE 3.13
    ||___||_\_|_/\_\ | +-----------------------+-------------------
    | __ __ ___ _ _ | Maxtor #4D040H2 32Gb. |DISPLAY_VGA SiS 630
    || \ \| . \| / | ------------------------+--+----------------
    || || | || \ | PCI Audio snd-trident 7018 | ViewSonic E771
    ||_|_|_||___/|_\_ | ---------------------------+----------------
    | | http://perlmonks.org/index.pl?node_id=245320
    '------------------' -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
     
    chanio, May 28, 2004
    #3
  4. Abhinav

    Abhinav Guest

    John Bokma wrote:
    > Abhinav wrote:
    >
    >> Hi
    >>
    >> I have a script where some chuncks of text are marked between xml-type
    >> tags .
    >>
    >> I say 'xml-type' instead of xml as the tags are preceded with a
    >> comment character "# " so that the script does not fail.

    >
    >
    > Why not put the XML at the end, after __END__ and read it using <DATA>?
    >

    Hi John ,
    Thanks ! I was not clear when I said "I have a script" . I actually
    meant that I have a Winrunner script, Not Perl script, in which i wanted
    to put these tags. (So as to extract info from the Winrunner script,
    using a perl script :) )

    >> I need to be able to extract the data between tags (which can be
    >> nested), and store it in a hash with each key being the tag itself and
    >> the value, the data in between (it is multiline).

    >
    >
    > Or open your script as a file, and read the #'s and throw away real
    > comments (you can use ## for real ones for example), and parse the
    > result. But I recommend __END__
    >
    >> The problem is that I initiially tried using Text::Balanced, but gave
    >> up since ir was too demanding for this kind of work .. spanning across
    >> multiple lines ..
    >>
    >> I am thinking of stripping the # from all tagged lines so that it
    >> becomes an xml file, adding a root element (which was not present
    >> before) , and then using an xml parser.

    >
    >
    > Yup, good idea :-D.
    >
    >> My questions :
    >> 1. Is the approach feasible, or is there som other simpler way to do
    >> it .. (after all, TIMTOWTDI)

    >
    >
    > use __END__
    >
    >> 2. If the above is the optimal solution, is there any parser/module
    >> shipped along with the standard perl (5.8) distro .. ?

    >
    >
    > Yes, but I like XML::Twig a lot ;-) Have a look at it.
    >
    > http://xmltwig.com/xmltwig/
    >
    > Other pointers:
    >
    > http://www.xml.com/pub/a/2000/04/05/feature/index.html
    > http://perl-xml.sourceforge.net/faq/


    Thanks .. that gives me enough to do for now :) Anyway, good to know
    that the approach I want to use fnids accepteance :)

    Regards
    AB
     
    Abhinav, May 28, 2004
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. HarishN
    Replies:
    4
    Views:
    235
    Anno Siegel
    Feb 23, 2004
  2. arne
    Replies:
    0
    Views:
    354
  3. Erik Wasser
    Replies:
    5
    Views:
    465
    Peter J. Holzer
    Mar 5, 2006
  4. Sean
    Replies:
    3
    Views:
    279
    robic0
    Oct 3, 2006
  5. Sean
    Replies:
    0
    Views:
    370
Loading...

Share This Page