High Performance Xml parser

R

rony

Hi,
I am looking for component which allows me to parse my xml file.
the reason i am asking this, is because my xml files are huge it can
reach as far as 1GB more or less.
the time to parse such a file is something like 5 Hours.
Now i am using the XmlRead, XmlNode ... (I do not load the file to the
memory).
Can you suggest better components to use?

** I tried SAX but i couldn't understand how it works, because there is
no examples for .net , and very bad documentation.
p.s : I am writing in C#.

Regards, Rony
 
J

Joseph Kesselman

If parsing a 1GB file is taking 5 hours, the problem isn't the parser --
it's the fact that the data model (presumably an implementation of the
DOM?) is becoming so huge that your machine's thrashing itself to death
swapping data in and out of memory.

SAX-based processing, when appropriate, is indeed a recommended solution
for that. Or SAX feeding into a more specialized data model. Or --
perhaps -- an XML database tool, which has its own specialized models
and may be able to handle paging of data more intelligently than the
system's default swapper.

I don't use C#, so I can't advise you regarding specific tools.
 
M

Martin Honnen

rony said:
I am looking for component which allows me to parse my xml file.
the reason i am asking this, is because my xml files are huge it can
reach as far as 1GB more or less.
the time to parse such a file is something like 5 Hours.
Now i am using the XmlRead, XmlNode ... (I do not load the file to the
memory).
Can you suggest better components to use?

** I tried SAX but i couldn't understand how it works, because there is
no examples for .net , and very bad documentation.
p.s : I am writing in C#.

XmlNode in the .NET framework is part of .NET's DOM implementation thus
if you use XmlNode then your code is loading the XML in memory, or at
least part of it depending on what exactly your code does.

With .NET you have XmlReader for fast forwards only pull parsing, that
is the best approach the .NET framework has to offer for parsing such
large files. With the XmlReader the memory/resource consumption should
not increase with the size of the XML as the reader pulls in the XML
node by node.

I think microsoft.public.dotnet.xml is a better place to discuss .NET
specific questions on parsing XML.
 
R

rony

HI,
What i am doing is making a reader with XmlTextReader
end then
while (reader.Read())
{
}
so nothing is loaded to the memory.
but still i think 5 hours to 1gb of xml file is very slow.
is there any components that based on sax that can improve the
performance?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

Forum statistics

Threads
473,769
Messages
2,569,579
Members
45,053
Latest member
BrodieSola

Latest Threads

Top