High Performance Xml parser

Discussion in 'XML' started by rony, Nov 27, 2006.

  1. rony

    rony Guest

    Hi,
    I am looking for component which allows me to parse my xml file.
    the reason i am asking this, is because my xml files are huge it can
    reach as far as 1GB more or less.
    the time to parse such a file is something like 5 Hours.
    Now i am using the XmlRead, XmlNode ... (I do not load the file to the
    memory).
    Can you suggest better components to use?

    ** I tried SAX but i couldn't understand how it works, because there is
    no examples for .net , and very bad documentation.
    p.s : I am writing in C#.

    Regards, Rony
    rony, Nov 27, 2006
    #1
    1. Advertising

  2. If parsing a 1GB file is taking 5 hours, the problem isn't the parser --
    it's the fact that the data model (presumably an implementation of the
    DOM?) is becoming so huge that your machine's thrashing itself to death
    swapping data in and out of memory.

    SAX-based processing, when appropriate, is indeed a recommended solution
    for that. Or SAX feeding into a more specialized data model. Or --
    perhaps -- an XML database tool, which has its own specialized models
    and may be able to handle paging of data more intelligently than the
    system's default swapper.

    I don't use C#, so I can't advise you regarding specific tools.

    --
    Joe Kesselman / Beware the fury of a patient man. -- John Dryden
    Joseph Kesselman, Nov 27, 2006
    #2
    1. Advertising

  3. rony wrote:

    > I am looking for component which allows me to parse my xml file.
    > the reason i am asking this, is because my xml files are huge it can
    > reach as far as 1GB more or less.
    > the time to parse such a file is something like 5 Hours.
    > Now i am using the XmlRead, XmlNode ... (I do not load the file to the
    > memory).
    > Can you suggest better components to use?
    >
    > ** I tried SAX but i couldn't understand how it works, because there is
    > no examples for .net , and very bad documentation.
    > p.s : I am writing in C#.


    XmlNode in the .NET framework is part of .NET's DOM implementation thus
    if you use XmlNode then your code is loading the XML in memory, or at
    least part of it depending on what exactly your code does.

    With .NET you have XmlReader for fast forwards only pull parsing, that
    is the best approach the .NET framework has to offer for parsing such
    large files. With the XmlReader the memory/resource consumption should
    not increase with the size of the XML as the reader pulls in the XML
    node by node.

    I think microsoft.public.dotnet.xml is a better place to discuss .NET
    specific questions on parsing XML.

    --

    Martin Honnen
    http://JavaScript.FAQTs.com/
    Martin Honnen, Nov 27, 2006
    #3
  4. rony

    rony Guest

    HI,
    What i am doing is making a reader with XmlTextReader
    end then
    while (reader.Read())
    {
    }
    so nothing is loaded to the memory.
    but still i think 5 hours to 1gb of xml file is very slow.
    is there any components that based on sax that can improve the
    performance?


    Martin Honnen wrote:
    > rony wrote:
    >
    > > I am looking for component which allows me to parse my xml file.
    > > the reason i am asking this, is because my xml files are huge it can
    > > reach as far as 1GB more or less.
    > > the time to parse such a file is something like 5 Hours.
    > > Now i am using the XmlRead, XmlNode ... (I do not load the file to the
    > > memory).
    > > Can you suggest better components to use?
    > >
    > > ** I tried SAX but i couldn't understand how it works, because there is
    > > no examples for .net , and very bad documentation.
    > > p.s : I am writing in C#.

    >
    > XmlNode in the .NET framework is part of .NET's DOM implementation thus
    > if you use XmlNode then your code is loading the XML in memory, or at
    > least part of it depending on what exactly your code does.
    >
    > With .NET you have XmlReader for fast forwards only pull parsing, that
    > is the best approach the .NET framework has to offer for parsing such
    > large files. With the XmlReader the memory/resource consumption should
    > not increase with the size of the XML as the reader pulls in the XML
    > node by node.
    >
    > I think microsoft.public.dotnet.xml is a better place to discuss .NET
    > specific questions on parsing XML.
    >
    > --
    >
    > Martin Honnen
    > http://JavaScript.FAQTs.com/
    rony, Nov 28, 2006
    #4
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. David McNab
    Replies:
    0
    Views:
    641
    David McNab
    Apr 23, 2004
  2. =?iso-8859-1?q?Benjamin_B=E9car?=

    XML <=> Text conversion platform requiring high performance

    =?iso-8859-1?q?Benjamin_B=E9car?=, Aug 24, 2006, in forum: XML
    Replies:
    8
    Views:
    343
    =?iso-8859-1?q?Benjamin_B=E9car?=
    Aug 25, 2006
  3. rony_16

    High Performance Xml parser

    rony_16, Nov 27, 2006, in forum: ASP .Net
    Replies:
    3
    Views:
    901
    bruce barker \(sqlwork.com\)
    Nov 27, 2006
  4. Replies:
    0
    Views:
    430
  5. arne
    Replies:
    0
    Views:
    343
Loading...

Share This Page