Working with a 1-GB XML file...

K

kj

Hi. I have a large XML file (aboug 1G) that I would like to be
able to interrogate in my code. Given its size, it's out of the
question to read it all into memory. I'd like to avoid having to
convert this thing to an RDB.

Does anyone know of a module that can treat such a file as
disk-resident data?

TIA!

kj
 
X

xhoster

kj said:
Hi. I have a large XML file (aboug 1G) that I would like to be
able to interrogate in my code.

In what ways do you want to interrogate it? Is all the data in the file
relevant to you, or could you abstract just the relevant parts of it into
a much smaller, memory resident set? (XML::Twig might be good for that.)
Given its size, it's out of the
question to read it all into memory. I'd like to avoid having to
convert this thing to an RDB.

How about converting it to a DBM::Deep file?
Does anyone know of a module that can treat such a file as
disk-resident data?

Well, no module is needed to treat it as disk-resident data, as that is
exactly what it is already. You need to give us a functional definition of
how you want to access the data. That will most likely drive the storage,
not the other way around.

You might be able to use DBD::AnyData, but there is no particular reason to
think it will like the format your XML is already in, or that it will be
fast.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.
 
J

John Bokma

kj said:
Hi. I have a large XML file (aboug 1G) that I would like to be
able to interrogate in my code. Given its size, it's out of the
question to read it all into memory. I'd like to avoid having to
convert this thing to an RDB.

Does anyone know of a module that can treat such a file as
disk-resident data?

It all depends a lot on /what/ is in the XML file. If it are records you
have to process one by one, XML::Twig might be the right answer. If you
have to process the file in a stream based way SAX or similar module might
be the answer.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,234
Latest member
SkyeWeems

Latest Threads

Top