Sorting a large XML file

R

Rishi Dhupar

Hi,

I have a 40-50 mb XML files consisting of 1000's of nodes that look
like:
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>1.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/15/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>1342</FileSize>
</File>

I don't really care what it is sorted by, but as long as I can sort the
file in some manor that is the same each time.

Is there any method to doing this? Loading the XML into memory and then
sorting is too memory intensive. My files could get upwards to 200mb.

Thanks for any tips
 
X

xhoster

Rishi Dhupar said:
Hi,

I have a 40-50 mb XML files consisting of 1000's of nodes that look
like:
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>1.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/15/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>1342</FileSize>
</File>

I don't really care what it is sorted by, but as long as I can sort the
file in some manor that is the same each time.

What was wrong with Ian Wilson's response from the last time when you
asked a very similar question?
Is there any method to doing this? Loading the XML into memory and then
sorting is too memory intensive. My files could get upwards to 200mb.

Thanks for any tips

My tip would be to not use XML for something it is ill-suited for.

Xho
 
J

John Bokma

Rishi said:
Hi,

I have a 40-50 mb XML files consisting of 1000's of nodes that look
like:
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>1.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/15/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>1342</FileSize>
</File>

I don't really care what it is sorted by, but as long as I can sort the
file in some manor that is the same each time.

Is there any method to doing this? Loading the XML into memory and then
sorting is too memory intensive. My files could get upwards to 200mb.

Parse it using a fast parser and make the info very compact, e.g. glue path
and name together, drop the // from the date, etc.

If you want to pay me, drop me a line :-D.
 
R

rishid

Just found xml::filter::sort

It is a godsend, does everything I need and has buffers and max memory
for large files. Pretty amazing module actually. Just found a bug in
it which is ticking me off, hopefully the author can get back to me.

If anyone has any experience with it here is the bug:
My XML Input file
<File>
<FileOwner></FileOwner>
<FilePath>C:\perl_files</FilePath>
<FileName>FSW_Output.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/18/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize>0</FileSize>
</File>

This is what is outputted:
<File>
<FileOwner />
<FilePath>C:\perl_files</FilePath>
<FileName>FSW_Output.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/18/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize />0
</File>

The output, FileOwner and FileSize gets messed up. Cannot figure out
what is wrong with it.
 
S

Sherm Pendley

This is what is outputted:
<File>
<FileOwner />
<FilePath>C:\perl_files</FilePath>
<FileName>FSW_Output.xml</FileName>
<FileAccessed>4/18/2005</FileAccessed>
<FileModified>4/18/2005</FileModified>
<FileCreated>4/18/2005</FileCreated>
<FileSize />0
</File>

The output, FileOwner and FileSize gets messed up. Cannot figure out
what is wrong with it.

Nothing wrong with FileOwner - that's a valid way to represent an empty
element in XML. Parsers will treat <FileOwner /> the same they would a
pair of opening and closing tags with nothing between them.

Don't know what happened to FileSize though...

sherm--
 
J

John Bokma

[ snip ]
Nothing wrong with FileOwner - that's a valid way to represent an empty
element in XML. Parsers will treat <FileOwner /> the same they would a
pair of opening and closing tags with nothing between them.

Don't know what happened to FileSize though...

Best guess: in a badly written test, 0 is seen as empty string :-D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,769
Messages
2,569,580
Members
45,055
Latest member
SlimSparkKetoACVReview

Latest Threads

Top