Sorting a large XML file

Discussion in 'Perl Misc' started by Rishi Dhupar, Apr 19, 2005.

  1. Rishi  Dhupar

    Rishi Dhupar Guest

    Hi,

    I have a 40-50 mb XML files consisting of 1000's of nodes that look
    like:
    <File>
    <FileOwner></FileOwner>
    <FilePath>C:\perl_files</FilePath>
    <FileName>1.xml</FileName>
    <FileAccessed>4/18/2005</FileAccessed>
    <FileModified>4/15/2005</FileModified>
    <FileCreated>4/18/2005</FileCreated>
    <FileSize>1342</FileSize>
    </File>

    I don't really care what it is sorted by, but as long as I can sort the
    file in some manor that is the same each time.

    Is there any method to doing this? Loading the XML into memory and then
    sorting is too memory intensive. My files could get upwards to 200mb.

    Thanks for any tips
    Rishi Dhupar, Apr 19, 2005
    #1
    1. Advertising

  2. Rishi  Dhupar

    Guest

    "Rishi Dhupar" <> wrote:
    > Hi,
    >
    > I have a 40-50 mb XML files consisting of 1000's of nodes that look
    > like:
    > <File>
    > <FileOwner></FileOwner>
    > <FilePath>C:\perl_files</FilePath>
    > <FileName>1.xml</FileName>
    > <FileAccessed>4/18/2005</FileAccessed>
    > <FileModified>4/15/2005</FileModified>
    > <FileCreated>4/18/2005</FileCreated>
    > <FileSize>1342</FileSize>
    > </File>
    >
    > I don't really care what it is sorted by, but as long as I can sort the
    > file in some manor that is the same each time.


    What was wrong with Ian Wilson's response from the last time when you
    asked a very similar question?

    > Is there any method to doing this? Loading the XML into memory and then
    > sorting is too memory intensive. My files could get upwards to 200mb.
    >
    > Thanks for any tips


    My tip would be to not use XML for something it is ill-suited for.

    Xho

    --
    -------------------- http://NewsReader.Com/ --------------------
    Usenet Newsgroup Service $9.95/Month 30GB
    , Apr 19, 2005
    #2
    1. Advertising

  3. Rishi  Dhupar

    John Bokma Guest

    Rishi Dhupar wrote:

    > Hi,
    >
    > I have a 40-50 mb XML files consisting of 1000's of nodes that look
    > like:
    > <File>
    > <FileOwner></FileOwner>
    > <FilePath>C:\perl_files</FilePath>
    > <FileName>1.xml</FileName>
    > <FileAccessed>4/18/2005</FileAccessed>
    > <FileModified>4/15/2005</FileModified>
    > <FileCreated>4/18/2005</FileCreated>
    > <FileSize>1342</FileSize>
    > </File>
    >
    > I don't really care what it is sorted by, but as long as I can sort the
    > file in some manor that is the same each time.
    >
    > Is there any method to doing this? Loading the XML into memory and then
    > sorting is too memory intensive. My files could get upwards to 200mb.


    Parse it using a fast parser and make the info very compact, e.g. glue path
    and name together, drop the // from the date, etc.

    If you want to pay me, drop me a line :-D.

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Apr 20, 2005
    #3
  4. Rishi  Dhupar

    Guest

    Just found xml::filter::sort

    It is a godsend, does everything I need and has buffers and max memory
    for large files. Pretty amazing module actually. Just found a bug in
    it which is ticking me off, hopefully the author can get back to me.

    If anyone has any experience with it here is the bug:
    My XML Input file
    <File>
    <FileOwner></FileOwner>
    <FilePath>C:\perl_files</FilePath>
    <FileName>FSW_Output.xml</FileName>
    <FileAccessed>4/18/2005</FileAccessed>
    <FileModified>4/18/2005</FileModified>
    <FileCreated>4/18/2005</FileCreated>
    <FileSize>0</FileSize>
    </File>

    This is what is outputted:
    <File>
    <FileOwner />
    <FilePath>C:\perl_files</FilePath>
    <FileName>FSW_Output.xml</FileName>
    <FileAccessed>4/18/2005</FileAccessed>
    <FileModified>4/18/2005</FileModified>
    <FileCreated>4/18/2005</FileCreated>
    <FileSize />0
    </File>

    The output, FileOwner and FileSize gets messed up. Cannot figure out
    what is wrong with it.
    , Apr 20, 2005
    #4
  5. wrote:

    > This is what is outputted:
    > <File>
    > <FileOwner />
    > <FilePath>C:\perl_files</FilePath>
    > <FileName>FSW_Output.xml</FileName>
    > <FileAccessed>4/18/2005</FileAccessed>
    > <FileModified>4/18/2005</FileModified>
    > <FileCreated>4/18/2005</FileCreated>
    > <FileSize />0
    > </File>
    >
    > The output, FileOwner and FileSize gets messed up. Cannot figure out
    > what is wrong with it.


    Nothing wrong with FileOwner - that's a valid way to represent an empty
    element in XML. Parsers will treat <FileOwner /> the same they would a
    pair of opening and closing tags with nothing between them.

    Don't know what happened to FileSize though...

    sherm--

    --
    Cocoa programming in Perl: http://camelbones.sourceforge.net
    Hire me! My resume: http://www.dot-app.org
    Sherm Pendley, Apr 20, 2005
    #5
  6. Rishi  Dhupar

    John Bokma Guest

    Sherm Pendley wrote:

    > wrote:
    >
    >> This is what is outputted:
    >> <File>
    >> <FileOwner />


    [ snip ]

    >> <FileSize />0
    >> </File>
    >>
    >> The output, FileOwner and FileSize gets messed up. Cannot figure out
    >> what is wrong with it.

    >
    > Nothing wrong with FileOwner - that's a valid way to represent an empty
    > element in XML. Parsers will treat <FileOwner /> the same they would a
    > pair of opening and closing tags with nothing between them.
    >
    > Don't know what happened to FileSize though...


    Best guess: in a badly written test, 0 is seen as empty string :-D

    --
    John Small Perl scripts: http://johnbokma.com/perl/
    Perl programmer available: http://castleamber.com/
    Happy Customers: http://castleamber.com/testimonials.html
    John Bokma, Apr 20, 2005
    #6
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. bisuvious

    sorting large file

    bisuvious, Mar 25, 2007, in forum: C++
    Replies:
    12
    Views:
    823
    Gianni Mariani
    Apr 6, 2007
  2. JJ
    Replies:
    13
    Views:
    517
  3. Replies:
    23
    Views:
    1,244
    Albert van der Horst
    Feb 2, 2008
  4. Erik Wasser
    Replies:
    5
    Views:
    448
    Peter J. Holzer
    Mar 5, 2006
  5. Replies:
    5
    Views:
    872
    Xho Jingleheimerschmidt
    Apr 2, 2009
Loading...

Share This Page