Get "java.lang.OutOfMemoryError" when Parsing an XML useing DOM

L

Lew

Andrew said:
Hmm... That is quite an impressive difference,
isn't it? Lew's estimate was not far off (I did
not comment at the time - but I really thought
his statement of '2 hour -> 1 to 2 seconds' was
unrealistic!).

Oh, ye of little faith! :)

It would've been fine with me if I were wrong - I have been proven wrong in
this forum several times before. I just know how fast a good SAX
implementation can be, went out on a limb and was right this time.

I wonder if there weren't a particular problem with the DOM implementation,
though. Others in this thread have had better success with a DOM approach than
the OP did.

-- Lew
 
T

Tom Hawtin

Lew said:
I wonder if there weren't a particular problem with the DOM
implementation, though. Others in this thread have had better success
with a DOM approach than the OP did.

Possibly something to do with the form of the XML being used. IIRC,
there is something about handling of attributes that can make DOM very
slow. It's also going to be somewhat implementation dependent.

Tom Hawtin
 
A

Andrew Thompson

Oh, ye of little faith! :)

Damn faith! Give me run-time results, anyday! ;-)
(If you had stated it as 'code I worked on,
improved ...' I would have been prepared to
accept it at face value..)

Andrew T.
 
N

NeoGeoSNK

NeoGeoSNK wrote:

...> I don't know how DOM works when it parsing a XML, I use DOM that is

...

Imagine working in an office, doing some complicated task, using a desk
with a limited area, and a file cabinet with far more paper in it than
can fit on the desk.

The desk top is usually full, so when you need to create a new document
or get something from the filing cabinet, you need to remove something
from the desk. The easiest way is to just get rid of a paper you have
not looked at recently.

There are two very different cases:

1. The pages you need more often than once every few minutes all fit on
the desk. You spend most of your time working, but sometimes have to get
another paper from the file cabinet.

2. The task you are doing needs far more papers than can fit on the
desk. Every time you need to follow up a reference, it points to a page
that is in the filing cabinet, and you cannot make progress until you
get it. But to put it on the desk, you have to remove something else,
and a few minutes later you need the page that you just removed...

The second condition is page thrashing.

desk top <-> computer's main memory
file cabinet <-> swap file
page of paper <-> virtual storage page

There are two cases when building the whole document in memory:

1. It fits. In that case there will be a heap size that is both big
enough to hold the document (no out of memory errors) and small enough
to fit on the desk (no page thrashing, the computer spends most of its
time doing useful work, not shuffling pages between disk and memory).
The obvious heap size to try is a bit smaller than the computer's
physical memory. If any size works, that one will.

2. It does not fit. Any memory size big enough to avoid OutOfMemoryError
is big enough to cause page thrashing.

Patricia


Thanks Patricia
Your explain is very clear, Because of my poor English I can't
understand your example very well,Mybe it will take several days
before I understand is completely :)

Ny
 
N

NeoGeoSNK

Oh, ye of little faith! :)

It would've been fine with me if I were wrong - I have been proven wrong in
this forum several times before. I just know how fast a good SAX
implementation can be, went out on a limb and was right this time.

I wonder if there weren't a particular problem with the DOM implementation,
though. Others in this thread have had better success with a DOM approach than
the OP did.

-- Lew

Thanks Lew
I pasted my source code below,maybe you can point out some problems of
my DOM implementation when you free:)
//The Set parsing(String filename) is implemented by DOM
//The Set parsing(String filename, boolean sax) is implemented by SAX



import java.io.*;
import org.w3c.dom.*;
import javax.xml.parsers.*;
import org.xml.sax.*;
import java.util.*;
import javax.xml.xpath.*;
import org.xml.sax.helpers.*;

/**
* parsing a XML format log file and retrieval all subscribers info.
* @author yning
*
*/

class SAXhandler extends DefaultHandler{
public SAXhandler(Set subscribers){
this.subscribers = subscribers;
}

int ing;
int ed;
boolean inasub = false;
boolean callingflag = false;
boolean calledflag = false;
boolean lrnflag = false;
boolean dirflag = false;
Set subscribers;
SubInfo subscriber;
public void startElement(String namespaceURL, String lname, String
qname, Attributes attr){

if(qname.equals("string")){
//System.out.println("Sax parser = " + qname);
//System.out.println("attr = " + attr.getValue(0));
String value = attr.getValue(0);
if(value.equals("Sub_OAM_DirNumber")){
subscriber = new SubInfo();
dirflag = true;
}else{
if(value.equals("create")){
subscriber.setModifier("create");
}else{
if(value.equals("modify")){
subscriber.setModifier("modify");
}else{
if(value.equals("delete")){
subscriber.setModifier("delete");
}else{
if(value.trim().matches("dirNumberId.*")){
//System.out.println("dirNumberId = " +
value);
String dirnumber =
value.substring(value.indexOf("dirNumberId=") + 12,
value.indexOf(",sHLRSubsOrganizationId"));
String ndc =
value.substring(value.indexOf("nDCId=") + 6,
value.indexOf(",managedElementId=SHLR"));
// System.out.println("dirnumber=" +
dirnumber + ndc);
subscriber.setNDCId(ndc);
subscriber.setdirNumberId(dirnumber);
}else{
if(value.equals("callingList")){
callingflag = true;
}else{
if(callingflag == true){
if(value.equals("NULL"))
subscriber.removeCallingList();
else
subscriber.addCallingList(value);
// System.out.println("callingService = " +
value.trim());
//System.out.println("ing = " + ing++);
callingflag = false;
}else{
if(value.equals("calledList")){
calledflag = true;
}else{
if(calledflag == true){
if(value.equals("NULL"))
subscriber.removeCalledList();
else
subscriber.addCalledList(value);
// System.out.println("calledService = " +
value.trim());
// System.out.println("ed = " + ed++);
calledflag = false;
}else{
if(value.equals("lRNumberId")){
lrnflag = true;
}else{
if(lrnflag == true){
// System.out.println("lrnnumber = " + value);
subscriber.setlrnNumberId(value);
lrnflag = false;
}
}
}
}
}
}
}


}
}
}
}


}
}

public void endElement(String uri, String lname, String qname){
if(qname.equals("record") && dirflag == true){
subscribers.add(subscriber);
dirflag = false;
}
}



}






public class ParsingLog {


public Set parsing(String filename, boolean sax)throws Exception{
Set subset = new LinkedHashSet();
File f = new File(filename);
SAXParserFactory factory = SAXParserFactory.newInstance();
SAXParser paser = factory.newSAXParser();
SAXhandler handler = new SAXhandler(subset);
paser.parse(f, handler);
return handler.subscribers;
}



public Set parsing(String filename) throws Exception{
Set subset = new LinkedHashSet();
File f = new File(filename);
DocumentBuilderFactory factory =
DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse(f);
Element root = doc.getDocumentElement();
XPathFactory xpfactory = XPathFactory.newInstance();
XPath path = xpfactory.newXPath();
NodeList recoredlist = (NodeList)path.evaluate("/journal/record",
doc, XPathConstants.NODESET);
// System.out.println("frameIdlist.getLength()= " +
recoredlist.getLength());
//enumerate all record in a log
for(int i = 0; i < recoredlist.getLength(); i ++){
// System.out.println("recoredlist = " + recoredlist.item(i));
Node record = recoredlist.item(i);
Element recordelement = (Element)record;
//System.out.println(recordelement.getTagName());
//get operat type
String BEtype = (String)path.evaluate("header/header_generic/domain/
@value", recordelement);
// System.out.println("operation type = " + BEtype);
if(!BEtype.equals("SHLR::Subscription"))
continue;
SubInfo subscriber = new SubInfo();
NodeList framelist = (NodeList)path.evaluate("body/frame",
recordelement, XPathConstants.NODESET);
// System.out.println("framelist = " + framelist.getLength());
//enumerate frame list in a record
for(int j = 0; j < framelist.getLength(); j++){
// System.out.println("frame = " + framelist.item(j));
NodeList attriblist = (NodeList)path.evaluate("attribute/
attribute_value/string/@value", framelist.item(j),
XPathConstants.NODESET);
for(int k = 0; k < attriblist.getLength(); k++){
//System.out.println(attriblist.item(k));
//System.out.println(attriblist.item(k).getClass());
Node attribute = attriblist.item(k);
String value = attribute.getNodeValue();
//String value = att.getAttribute("Value");
// System.out.println("Value = " + value);
if(value.equals("create")){
subscriber.setModifier("create");
}else{
if(value.equals("modify")){
subscriber.setModifier("modify");
}else{
if(value.equals("delete")){
subscriber.setModifier("delete");
}else{
if(value.trim().matches("dirNumberId.*")){
//System.out.println("dirNumberId = " +
value);
String dirnumber =
value.substring(value.indexOf("dirNumberId=") + 12,
value.indexOf(",sHLRSubsOrganizationId"));
String ndc =
value.substring(value.indexOf("nDCId=") + 6,
value.indexOf(",managedElementId=SHLR"));
// System.out.println("dirnumber=" +
dirnumber + ndc);
subscriber.setNDCId(ndc);

subscriber.setdirNumberId(dirnumber);
}else{
if(value.equals("calledList")){
Node calledattr = attriblist.item(k + 1);
String calledvalue =
calledattr.getNodeValue();
// System.out.println("calledList = " +
calledvalue);
if(calledvalue.equals("NULL"))
subscriber.removeCalledList();
else
subscriber.addCalledList(calledvalue);
}else{
if(value.equals("callingList")){
Node callingattr = attriblist.item(k + 1);
String callingvalue =
callingattr.getNodeValue();
// System.out.println("callingList = " +
callingvalue);
if(callingvalue.equals("NULL"))
subscriber.removeCallingList();
else
subscriber.addCallingList(callingvalue);
}else{
if(value.equals("lRNumberId")){
Node lrnattr = attriblist.item(k + 1);
String lrnvalue = lrnattr.getNodeValue();
subscriber.setlrnNumberId(lrnvalue);

}
}
}

}
}
}
}
}
}
if(subscriber != null)
subset.add(subscriber);
}



return subset;
}

public static void main(String[] args)throws Exception{
System.out.println("start job:" + new Date());

ParsingLog a = new ParsingLog();
Set set = a.parsing("log_R2.2.xml");
System.out.println("\n\n\ntotal subscribers = " + set.size());
Iterator iterator = set.iterator();
SubInfo sub;
while(iterator.hasNext()){
System.out.println("subscriber to write");
sub = (SubInfo)iterator.next();
System.out.println("dirnumber:" + sub.getdirNumberId());
System.out.println("Modifier:" + sub.getModifier());
System.out.println("ndc:" + sub.getNDCId());
System.out.println("called list:" + sub.getCalledList());
System.out.println("calling list:" + sub.getCallingList());
System.out.println("lrn:" + sub.getlrnNumberId());
}
System.out.println("job finished:" + new Date());

/*
Set saxset;
SubInfo sub;
ParsingLog b = new ParsingLog();
saxset = b.parsing("log_R2.2.xml", true);
System.out.println("set size = " + saxset.size());
Iterator iterator = saxset.iterator();
while(iterator.hasNext()){
System.out.println("subscriber to write");
sub = (SubInfo)iterator.next();
System.out.println("dirnumber:" + sub.getdirNumberId());
System.out.println("Modifier:" + sub.getModifier());
System.out.println("ndc:" + sub.getNDCId());
System.out.println("called list:" + sub.getCalledList());
System.out.println("calling list:" + sub.getCallingList());
System.out.println("lrn:" + sub.getlrnNumberId());
}
*/
System.out.println("job finished:" + new Date());
//saxset = b.parsing("log_R2.2.xml",true);
//System.out.println("set size = " + saxset.size());
}
}
 
N

NeoGeoSNK

Damn faith! Give me run-time results, anyday! ;-)
(If you had stated it as 'code I worked on,
improved ...' I would have been prepared to
accept it at face value..)

Andrew T.

Hello Andrew T
I just send my tool including the Log files "log_R2.2.xml" in a jar to
you, please check your mailbox.)

Ny
 
J

Jaakko Kangasharju

Andrew Thompson said:
Hmm... That is quite an impressive difference,
isn't it? Lew's estimate was not far off (I did
not comment at the time - but I really thought
his statement of '2 hour -> 1 to 2 seconds' was
unrealistic!).

It's not at all unrealistic, an XML document of the size the OP has
*should* take only a few seconds to parse. It's not that SAX is
extremely fast, it's that the DOM code was clearly thrashing and
therefore slow. With enough memory, DOM should take only a couple of
times longer than SAX.
 
A

Andrew Thompson

....
Hello Andrew T
I just send my tool including the Log files "log_R2.2.xml" in a jar to
you, please check your mailbox.)

Thanks. But in fact, although my comment above
seemed to invite you to do that, I do not actually
need folks from usenet to send me code. More
specifically, unless email from usenet includes
the word 'consultancy', it automatically gets deleted.

Please put anything that is worth hearing, here,
where we can all see it, and is is publicly archived
and searchable. Alternately, in case like the
Jar, it would probably be better to get a free site
at 'Geocities' or whatever, and upload it there,
but give us a link.

As an aside, I like your real name much more
than the nickname you use, for posting to
usenet. I encourage all people to use real
names when posting to usenet.

Andrew T.
 
A

Andreas Leitgeb

NeoGeoSNK said:
I can't wait any more time, the job is take nearly 2 hours but haven't
finished yet.I think I'll try the SAX api, is there more fast api to
parsing XML in java?
Out of curiosity: You wrote that you're using a
self-written xml-parser... any chance that you
accidentally created an endless loop?

You should add progress indicators, by inserting
System.out.println("..."), Even if this doesn't
make the code faster, it might give you an indication
on what really goes on(or wrong). (perhaps, after 2
hours it is still busy processing the first sub-item
of the input)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,777
Messages
2,569,604
Members
45,216
Latest member
topweb3twitterchannels

Latest Threads

Top