Converting XML to Perl structures FAST

I

Ignoramus17503

Aside from a suggestion to look at RXParse, which I will do, I have
not yet seen what I was looking for, so here's a rephrase of my
question.

I need to convert XML documents to Perl structures, very efficiently,
CPU wise.

I am currently using XML::Simple, which does what I want, but is
slow.

I have looked at various Perl XML FAQs, manual for XML::LibXML, etc,
looks like they parse XML into all kinds of strange (to me) things.

So, here's my question: what perl module converts XML to perl
structure (hashes of hashes of arrays etc), and does it very
efficiently.

I am not loooking for suggestions to "use google", I need suggestions
from people who have a real life answer.

Thanks.

i
 
R

robic0

Aside from a suggestion to look at RXParse, which I will do, I have
not yet seen what I was looking for, so here's a rephrase of my
question.

I need to convert XML documents to Perl structures, very efficiently,
CPU wise.

I am currently using XML::Simple, which does what I want, but is
slow.

I have looked at various Perl XML FAQs, manual for XML::LibXML, etc,
looks like they parse XML into all kinds of strange (to me) things.

So, here's my question: what perl module converts XML to perl
structure (hashes of hashes of arrays etc), and does it very
efficiently.

I am not loooking for suggestions to "use google", I need suggestions
from people who have a real life answer.

Thanks.

i

There are many here that have used Simple. The general procedure is
us Expat or Parse, set your handlers, set flags in the handlers,
grab the data when it comes around. Stop the grab when its gone.
When you hit the tag you need, store the "original" content data
that is passed (appeneded to a string, with tags) then pass the
entire "original" xml/xhtml (tags and all) to Simple to glean the hash data.
This avoids unnecessary duality.
Does that about cover it?

robic0
(god of porn)
 
R

robic0

There are many here that have used Simple. The general procedure is
us Expat or Parse, set your handlers, set flags in the handlers,
grab the data when it comes around. Stop the grab when its gone.
When you hit the tag you need, store the "original" content data
that is passed (appeneded to a string, with tags) then pass the
entire "original" xml/xhtml (tags and all) to Simple to glean the hash data.
This avoids unnecessary duality.
Does that about cover it?

robic0
(god of porn)

RXParse is just a Create/Filter/Search & Replace (modify)/ parser.
It won't internalize xml data into a hash. Although I did do one of those
posted here along time ago (a Simple replacement).

You need to understand that for what you (think) are trying to do you will have
to lead off with parser handlers to "drill down" to the start of the extraction
data, capture it (raw), wait for the finish, then past the "raw" string to Simple.

That is how its done buddy.....

robic0
(god of porn)
 
R

robic0

RXParse is just a Create/Filter/Search & Replace (modify)/ parser.
It won't internalize xml data into a hash. Although I did do one of those
posted here along time ago (a Simple replacement).

You need to understand that for what you (think) are trying to do you will have
to lead off with parser handlers to "drill down" to the start of the extraction
data, capture it (raw), wait for the finish, then past the "raw" string to Simple.

That is how its done buddy.....

robic0
(god of porn)

postscript:

Usually when you capture sub xml/xhtml in this fashion, you will want to encapsulate
the raw data with a tag before you send it to Simple. Simple invokes a user selected
parser (Expat is default, I think). So if its non-compliant it will croak/carp on you.
Expat is better than Parse though.

Like:

<root>

captured xml/xhtml

</root>

robic0
(god of porn)
 
R

robic0

postscript:

Usually when you capture sub xml/xhtml in this fashion, you will want to encapsulate
the raw data with a tag before you send it to Simple. Simple invokes a user selected
parser (Expat is default, I think). So if its non-compliant it will croak/carp on you.
Expat is better than Parse though.

Like:

<root>

captured xml/xhtml

</root>

robic0
(god of porn)

Btw, Simple doesen't know of RXParse, so it won't invoke it. RXParse is faster than Expat
and Parse, which each use a C dll interface. RXParse is a very fast (er than them) Perl only
parser. So they will not support it until its formalized on CPan or something. I won't take it
to CPan. I reject the Perl establishment, period. I am going to force the maggpies to come to me!!!

robic0
(god of porn)
 
I

Ignoramus17503

Why have you started a new thread?

Please ignore any post from robic0 and don't consider trying to use
RXParse. If you do, you will get no help from anyone here (including
robic0).

I have used XML::parser and the expat library
(<http://expat.sourceforge.net/>) to parse XML. I started developing a
program that uses XML by using XML::SAX::purePerl. It worked on small
test files. When I was ready to test on larger files, I installed the
expat library and used XML::parser. The speed-up was about a factor of
40.

Do you have any code sample that you could share?
SAX parsers do not produce Perl data structures. They call your
routines on each element. You then store the data in your own
structures. It is very efficient, but I do not have any experience with
XML::Simple or XML::Twig, so cannot give you a comparison.

Thank you for the tips.

I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

I am running my process, which repeatedly parses large XML structures,
now, it will run for the rest of the evening. Time will tell if it
slows down with more parsed documents, maybe due to memory leaks or
who knows what.

i
 
R

robic0

Why have you started a new thread?

Please ignore any post from robic0 and don't consider trying to use
RXParse. If you do, you will get no help from anyone here (including
robic0).

I have used XML::parser and the expat library
(<http://expat.sourceforge.net/>) to parse XML. I started developing a
program that uses XML by using XML::SAX::purePerl. It worked on small
test files. When I was ready to test on larger files, I installed the
expat library and used XML::parser. The speed-up was about a factor of
40.


SAX parsers do not produce Perl data structures. They call your
routines on each element. You then store the data in your own
structures. It is very efficient, but I do not have any experience with
XML::Simple or XML::Twig, so cannot give you a comparison.

I'm gonna let this slight go Jim, consider yourself lucky!!!

Since you do not have experience with Simple or Twig, I consider
you post a light easy breeze that fades with the tides. I would not
have folled you comments but for the "indirect" reference to ignore
robic0 entirely! I have a long memory and will not forget this.

I've been purposely absent and choose posts now based on my expertise.
I have on now. A really big, complicated one.

He never mentioned SAX (simple api xml), why did you? You don't know
what xml is and you will never. I don't take kindly to personal attacks!
The next one and I will rip you a new asshole!!!!!!

robic0
(god of porn)
 
R

robic0

Do you have any code sample that you could share?


Thank you for the tips.

I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

I am running my process, which repeatedly parses large XML structures,
now, it will run for the rest of the evening. Time will tell if it
slows down with more parsed documents, maybe due to memory leaks or
who knows what.

i
Oh I thought your intention was to create data structures? Or was it to
parse xml? Or create data structures from parsed xml?
Its a really, really hard, hard thing to get from you the simplest of
simple answers.

The volume of folks (I know of) here, know these answers intimately,
me one of them.

You are indeed an Ebiotch asshole !!!!!!!!!!!
(this line above won't get you anymore answers)

robic0
(god of porn)
 
I

Ignoramus17503

Oh I thought your intention was to create data structures? Or was it to
parse xml? Or create data structures from parsed xml?
Its a really, really hard, hard thing to get from you the simplest of
simple answers.

The volume of folks (I know of) here, know these answers intimately,
me one of them.

You are indeed an Ebiotch asshole !!!!!!!!!!!
(this line above won't get you anymore answers)

robic0
(god of porn)

My goal was to parse XML into usable data structures.

i
 
R

robic0

My goal was to parse XML into usable data structures.

i

Its been said over and over again in the last hour....

This is a little info for you. It is extremely *HARD*
to divine xml into data structures!!!!!!!!

Given a flat requirement, passing a module to such that does
so will prove useless to the requestor!

DO YOU NOT FLUCKIN UNDERSTAND THAT??????????????????????

robic0
(god of porn)
 
S

Sherm Pendley

Ignoramus17503 said:
Aside from a suggestion to look at RXParse, which I will do

Do yourself a favor first, and search Google Groups for robic0.

sherm--
 
J

John Bokma

Ignoramus17503 said:
I installed XML::Twig, and things seem, so far, to be a lot faster and
CPU use is way down. It is not quite as easy to use, but I can live
with it.

Uhm, how easy do you want it? Study the manual and the examples.
I am running my process, which repeatedly parses large XML structures,

The same structure?
 
I

Ignoramus17503

Do yourself a favor first, and search Google Groups for robic0.

Well, yes, I think that you have a point. Anyway, I am now using
XML::Twig and it seems to be quite stable.

i
 
I

Ignoramus17503

Uhm, how easy do you want it? Study the manual and the examples.

Well, here's how I accessed data with XML::Simple:

my $price = $item->{SellingStatus}->{CurrentPrice}->{content};

Here's how I access it with XML::Twig:

$price = $item->first_child( 'SellingStatus' )->first_child( 'CurrentPrice' )->text;

Clearly, the former is easier.
The same structure?

Yes, exactly same structures.

i
 
M

Michel Rodriguez

Ignoramus17503 said:
Well, here's how I accessed data with XML::Simple:

my $price = $item->{SellingStatus}->{CurrentPrice}->{content};

Here's how I access it with XML::Twig:

$price = $item->first_child( 'SellingStatus' )->first_child( 'CurrentPrice' )->text;

If you do $item->simplify, then you get the exact same structure as with
XML::Simple (if there is any difference you can report this as a bug).

Also if you find any memory leak, report it, and hopefully I can fix the
bug.
 
I

Ignoramus7096

If you do $item->simplify, then you get the exact same structure as with
XML::Simple (if there is any difference you can report this as a bug).

Thanks, that's a great tip, I will use it in the future.
Also if you find any memory leak, report it, and hopefully I can fix the
bug.

Well, I have used it quite a lot by now, and I see no problems. I
mentioned memory leaks as a mere possibility, in relation to
XML::Simple, not XML::Twig.

Thank you for your fine work. I may add support of XML::Twig to
Net::eBay as a user option.


i
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,769
Messages
2,569,582
Members
45,057
Latest member
KetoBeezACVGummies

Latest Threads

Top