xml type parser in the standard perl installation ?

A

Abhinav

Hi

I have a script where some chuncks of text are marked between xml-type
tags .

I say 'xml-type' instead of xml as the tags are preceded with a comment
character "# " so that the script does not fail.

I need to be able to extract the data between tags (which can be
nested), and store it in a hash with each key being the tag itself and
the value, the data in between (it is multiline).

The problem is that I initiially tried using Text::Balanced, but gave up
since ir was too demanding for this kind of work .. spanning across
multiple lines ..

I am thinking of stripping the # from all tagged lines so that it
becomes an xml file, adding a root element (which was not present
before) , and then using an xml parser.

My questions :
1. Is the approach feasible, or is there som other simpler way to do it
... (after all, TIMTOWTDI)
2. If the above is the optimal solution, is there any parser/module
shipped along with the standard perl (5.8) distro .. ?

Many thanks ..
Abhinav
 
J

John Bokma

Abhinav said:
Hi

I have a script where some chuncks of text are marked between xml-type
tags .

I say 'xml-type' instead of xml as the tags are preceded with a comment
character "# " so that the script does not fail.

Why not put the XML at the end said:
I need to be able to extract the data between tags (which can be
nested), and store it in a hash with each key being the tag itself and
the value, the data in between (it is multiline).

Or open your script as a file, and read the #'s and throw away real
comments (you can use ## for real ones for example), and parse the
result. But I recommend __END__
The problem is that I initiially tried using Text::Balanced, but gave up
since ir was too demanding for this kind of work .. spanning across
multiple lines ..

I am thinking of stripping the # from all tagged lines so that it
becomes an xml file, adding a root element (which was not present
before) , and then using an xml parser.

Yup, good idea :-D.
My questions :
1. Is the approach feasible, or is there som other simpler way to do it
.. (after all, TIMTOWTDI)

use __END__
2. If the above is the optimal solution, is there any parser/module
shipped along with the standard perl (5.8) distro .. ?

Yes, but I like XML::Twig a lot ;-) Have a look at it.

http://xmltwig.com/xmltwig/

Other pointers:

http://www.xml.com/pub/a/2000/04/05/feature/index.html
http://perl-xml.sourceforge.net/faq/
 
C

chanio

John Bokma (comp.lang.perl.misc) dijo...
Or open your script as a file, and read the #'s and throw away real
comments (you can use ## for real ones for example), and parse the
result. But I recommend __END__


Yup, good idea :-D.


use __END__


Yes, but I like XML::Twig a lot ;-) Have a look at it.

http://xmltwig.com/xmltwig/

Other pointers:

http://www.xml.com/pub/a/2000/04/05/feature/index.html
http://perl-xml.sourceforge.net/faq/
You read each line without the preceding # and load it all in a scalar. Then
add it to the XMLin part of xmltwig. And get the parsed xml from XMLout.
Read the help file well since it has a lot of clauses in order to interpret
it well, but it is only trying and changing until you get the best result.
Then you have it all inside a hash reference (with array references inside).

--
.------------------. -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
| ___ _ _ _ _ | ALBERTO ADRIAN SCHIANO - ARGENTINA - 2004
|/ __/ | \ || | | | <[email protected]> # 34-34S 058-25W(z-3)
|||_< \| || ' | | +------------+------------------------------
|`____/|_\_|`___' | LINUX COUNTER: 240 133 ~ machine : 119 401
| _ _ _ __ _ | +------------+----------+-------------------
|| | | \ |\ \/ | AMD Athlon 6 |RAM 512Mb.|krnl.: 2.6.3-10mdk
|| |_ | | \ \ | i586-mandrake-linux-gnu |MDK 9.2 - KDE 3.13
||___||_\_|_/\_\ | +-----------------------+-------------------
| __ __ ___ _ _ | Maxtor #4D040H2 32Gb. |DISPLAY_VGA SiS 630
|| \ \| . \| / | ------------------------+--+----------------
|| || | || \ | PCI Audio snd-trident 7018 | ViewSonic E771
||_|_|_||___/|_\_ | ---------------------------+----------------
| | http://perlmonks.org/index.pl?node_id=245320
'------------------' -.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.-.
 
A

Abhinav

John said:
Why not put the XML at the end, after __END__ and read it using <DATA>?
Hi John ,
Thanks ! I was not clear when I said "I have a script" . I actually
meant that I have a Winrunner script, Not Perl script, in which i wanted
to put these tags. (So as to extract info from the Winrunner script,
using a perl script :) )
Or open your script as a file, and read the #'s and throw away real
comments (you can use ## for real ones for example), and parse the
result. But I recommend __END__



Yup, good idea :-D.



use __END__



Yes, but I like XML::Twig a lot ;-) Have a look at it.

http://xmltwig.com/xmltwig/

Other pointers:

http://www.xml.com/pub/a/2000/04/05/feature/index.html
http://perl-xml.sourceforge.net/faq/

Thanks .. that gives me enough to do for now :) Anyway, good to know
that the approach I want to use fnids accepteance :)

Regards
AB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,042
Latest member
icassiem

Latest Threads

Top