Converting XML to Perl structures FAST

Ignoramus17503 · Jun 13, 2006

Aside from a suggestion to look at RXParse, which I will do, I have
not yet seen what I was looking for, so here's a rephrase of my
question.

I need to convert XML documents to Perl structures, very efficiently,
CPU wise.

I am currently using XML::Simple, which does what I want, but is
slow.

I have looked at various Perl XML FAQs, manual for XML::LibXML, etc,
looks like they parse XML into all kinds of strange (to me) things.

So, here's my question: what perl module converts XML to perl
structure (hashes of hashes of arrays etc), and does it very
efficiently.

I am not loooking for suggestions to "use google", I need suggestions
from people who have a real life answer.

Thanks.

i

robic0 · Jun 13, 2006

Aside from a suggestion to look at RXParse, which I will do, I have
not yet seen what I was looking for, so here's a rephrase of my
question.

I need to convert XML documents to Perl structures, very efficiently,
CPU wise.

I am currently using XML::Simple, which does what I want, but is
slow.

I have looked at various Perl XML FAQs, manual for XML::LibXML, etc,
looks like they parse XML into all kinds of strange (to me) things.

So, here's my question: what perl module converts XML to perl
structure (hashes of hashes of arrays etc), and does it very
efficiently.

I am not loooking for suggestions to "use google", I need suggestions
from people who have a real life answer.

Thanks.

i

There are many here that have used Simple. The general procedure is
us Expat or Parse, set your handlers, set flags in the handlers,
grab the data when it comes around. Stop the grab when its gone.
When you hit the tag you need, store the "original" content data
that is passed (appeneded to a string, with tags) then pass the
entire "original" xml/xhtml (tags and all) to Simple to glean the hash data.
This avoids unnecessary duality.
Does that about cover it?

robic0
(god of porn)

robic0 · Jun 13, 2006

There are many here that have used Simple. The general procedure is
us Expat or Parse, set your handlers, set flags in the handlers,
grab the data when it comes around. Stop the grab when its gone.
When you hit the tag you need, store the "original" content data
that is passed (appeneded to a string, with tags) then pass the
entire "original" xml/xhtml (tags and all) to Simple to glean the hash data.
This avoids unnecessary duality.
Does that about cover it?

robic0
(god of porn)

RXParse is just a Create/Filter/Search & Replace (modify)/ parser.
It won't internalize xml data into a hash. Although I did do one of those
posted here along time ago (a Simple replacement).

You need to understand that for what you (think) are trying to do you will have
to lead off with parser handlers to "drill down" to the start of the extraction
data, capture it (raw), wait for the finish, then past the "raw" string to Simple.

That is how its done buddy.....

robic0
(god of porn)

robic0 · Jun 13, 2006

RXParse is just a Create/Filter/Search & Replace (modify)/ parser.
It won't internalize xml data into a hash. Although I did do one of those
posted here along time ago (a Simple replacement).

You need to understand that for what you (think) are trying to do you will have
to lead off with parser handlers to "drill down" to the start of the extraction
data, capture it (raw), wait for the finish, then past the "raw" string to Simple.

That is how its done buddy.....

robic0
(god of porn)

postscript:

Usually when you capture sub xml/xhtml in this fashion, you will want to encapsulate
the raw data with a tag before you send it to Simple. Simple invokes a user selected
parser (Expat is default, I think). So if its non-compliant it will croak/carp on you.
Expat is better than Parse though.

Like:

<root>

captured xml/xhtml

</root>

robic0
(god of porn)

robic0 · Jun 13, 2006

postscript:

Usually when you capture sub xml/xhtml in this fashion, you will want to encapsulate
the raw data with a tag before you send it to Simple. Simple invokes a user selected
parser (Expat is default, I think). So if its non-compliant it will croak/carp on you.
Expat is better than Parse though.

Like:

<root>

captured xml/xhtml

</root>

robic0
(god of porn)

Btw, Simple doesen't know of RXParse, so it won't invoke it. RXParse is faster than Expat
and Parse, which each use a C dll interface. RXParse is a very fast (er than them) Perl only
parser. So they will not support it until its formalized on CPan or something. I won't take it
to CPan. I reject the Perl establishment, period. I am going to force the maggpies to come to me!!!

robic0
(god of porn)

Ignoramus17503 · Jun 13, 2006

Why have you started a new thread?

Please ignore any post from robic0 and don't consider trying to use
RXParse. If you do, you will get no help from anyone here (including
robic0).

I have used XML:arser and the expat library
(<http://expat.sourceforge.net/>) to parse XML. I started developing a
program that uses XML by using XML::SAX:urePerl. It worked on small
test files. When I was ready to test on larger files, I installed the
expat library and used XML:arser. The speed-up was about a factor of
40.

Do you have any code sample that you could share?

SAX parsers do not produce Perl data structures. They call your
routines on each element. You then store the data in your own
structures. It is very efficient, but I do not have any experience with
XML::Simple or XML::Twig, so cannot give you a comparison.

Thank you for the tips.

I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

I am running my process, which repeatedly parses large XML structures,
now, it will run for the rest of the evening. Time will tell if it
slows down with more parsed documents, maybe due to memory leaks or
who knows what.

i

robic0 · Jun 13, 2006

Why have you started a new thread?

Please ignore any post from robic0 and don't consider trying to use
RXParse. If you do, you will get no help from anyone here (including
robic0).

I have used XML:arser and the expat library
(<http://expat.sourceforge.net/>) to parse XML. I started developing a
program that uses XML by using XML::SAX:urePerl. It worked on small
test files. When I was ready to test on larger files, I installed the
expat library and used XML:arser. The speed-up was about a factor of
40.

SAX parsers do not produce Perl data structures. They call your
routines on each element. You then store the data in your own
structures. It is very efficient, but I do not have any experience with
XML::Simple or XML::Twig, so cannot give you a comparison.

I'm gonna let this slight go Jim, consider yourself lucky!!!

Since you do not have experience with Simple or Twig, I consider
you post a light easy breeze that fades with the tides. I would not
have folled you comments but for the "indirect" reference to ignore
robic0 entirely! I have a long memory and will not forget this.

I've been purposely absent and choose posts now based on my expertise.
I have on now. A really big, complicated one.

He never mentioned SAX (simple api xml), why did you? You don't know
what xml is and you will never. I don't take kindly to personal attacks!
The next one and I will rip you a new asshole!!!!!!

robic0
(god of porn)

robic0 · Jun 13, 2006

Do you have any code sample that you could share?

Thank you for the tips.

I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

I am running my process, which repeatedly parses large XML structures,
now, it will run for the rest of the evening. Time will tell if it
slows down with more parsed documents, maybe due to memory leaks or
who knows what.

i

Oh I thought your intention was to create data structures? Or was it to
parse xml? Or create data structures from parsed xml?
Its a really, really hard, hard thing to get from you the simplest of
simple answers.

The volume of folks (I know of) here, know these answers intimately,
me one of them.

You are indeed an Ebiotch asshole !!!!!!!!!!!
(this line above won't get you anymore answers)

robic0
(god of porn)

Ignoramus17503 · Jun 13, 2006

Oh I thought your intention was to create data structures? Or was it to
parse xml? Or create data structures from parsed xml?
Its a really, really hard, hard thing to get from you the simplest of
simple answers.

The volume of folks (I know of) here, know these answers intimately,
me one of them.

You are indeed an Ebiotch asshole !!!!!!!!!!!
(this line above won't get you anymore answers)

robic0
(god of porn)

My goal was to parse XML into usable data structures.

i

robic0 · Jun 13, 2006

My goal was to parse XML into usable data structures.

i

Its been said over and over again in the last hour....

This is a little info for you. It is extremely *HARD*
to divine xml into data structures!!!!!!!!

Given a flat requirement, passing a module to such that does
so will prove useless to the requestor!

DO YOU NOT FLUCKIN UNDERSTAND THAT??????????????????????

robic0
(god of porn)

Sherm Pendley · Jun 13, 2006

Ignoramus17503 said:
Aside from a suggestion to look at RXParse, which I will do

Do yourself a favor first, and search Google Groups for robic0.

sherm--

John Bokma · Jun 13, 2006

Ignoramus17503 said:
I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

Uhm, how easy do you want it? Study the manual and the examples.

I am running my process, which repeatedly parses large XML structures,

The same structure?

Ignoramus17503 · Jun 13, 2006

Do yourself a favor first, and search Google Groups for robic0.

Well, yes, I think that you have a point. Anyway, I am now using
XML::Twig and it seems to be quite stable.

i

Ignoramus17503 · Jun 13, 2006

Uhm, how easy do you want it? Study the manual and the examples.

Well, here's how I accessed data with XML::Simple:

my $price = $item->{SellingStatus}->{CurrentPrice}->{content};

Here's how I access it with XML::Twig:

$price = $item->first_child( 'SellingStatus' )->first_child( 'CurrentPrice' )->text;

Clearly, the former is easier.

The same structure?

Yes, exactly same structures.

i

Michel Rodriguez · Jun 14, 2006

Ignoramus17503 said:
Well, here's how I accessed data with XML::Simple:

my $price = $item->{SellingStatus}->{CurrentPrice}->{content};

Here's how I access it with XML::Twig:

$price = $item->first_child( 'SellingStatus' )->first_child( 'CurrentPrice' )->text;

If you do $item->simplify, then you get the exact same structure as with
XML::Simple (if there is any difference you can report this as a bug).

Also if you find any memory leak, report it, and hopefully I can fix the
bug.

Ignoramus7096 · Jun 14, 2006

If you do $item->simplify, then you get the exact same structure as with
XML::Simple (if there is any difference you can report this as a bug).

Thanks, that's a great tip, I will use it in the future.

Also if you find any memory leak, report it, and hopefully I can fix the
bug.

Well, I have used it quite a lot by now, and I see no problems. I
mentioned memory leaks as a mere possibility, in relation to
XML::Simple, not XML::Twig.

Thank you for your fine work. I may add support of XML::Twig to
Net::eBay as a user option.

i

Access to perl structures from C programs	4	May 8, 2006
How to make XML::XPath ignore namespaces?	0	May 21, 2013
Perl Module XML::Reader	1	Apr 27, 2010
Converting XML to CSV	2	Jun 8, 2006
XML::Simple drives me mad	9	Jul 29, 2011
update XML file with perl or other...?	5	Oct 7, 2008
suppressing bad characters in output PCDATA (converting JSON to XML)	6	Nov 25, 2011
add document tags to xml doc	3	Sep 15, 2010

Converting XML to Perl structures FAST

Ignoramus17503

robic0

robic0

robic0

robic0

Ignoramus17503

robic0

robic0

Ignoramus17503

robic0

Sherm Pendley

John Bokma

Ignoramus17503

Ignoramus17503

Michel Rodriguez

Ignoramus7096

Ask a Question

Similar Threads

Members online

Forum statistics

Latest Threads