Java SAX Performance Problem

  • Thread starter Christian Neuroth
  • Start date
C

Christian Neuroth

Hi!
I am using XML as an interface to a document management system. I
receive files encoded in a XML response. I use the SAX parser from
XERCES to retrieve the necessary information. Everything works fine in
my local runtime environment.

But running the application in a J2EE environment leads to extreme
performance suffering... Bottleneck seems to be the big number of
mehtod invocations. The profiler delivered the following methods which
are called most of the times: org.apache.xerces.util.XMLChar.isValid
org.apache.xerces.util.XMLChar.isInValid

Is there a strategy to avoid those numerouse invocations? The
retrieved file content is a huge part parsed by the characters()
method and each character is checked with these two (above mentioned)
methods... What can I do? Would DOM be an alternative? Are there
informations on how to build parser solutions for J2ee apps?

Thanks
Christian
 
R

Robert Olofsson

Christian Neuroth ([email protected]) wrote:
: But running the application in a J2EE environment leads to extreme
: performance suffering... Bottleneck seems to be the big number of
: mehtod invocations. The profiler delivered the following methods which
: are called most of the times: org.apache.xerces.util.XMLChar.isValid
: org.apache.xerces.util.XMLChar.isInValid

Hmmm, how big file are you parsing?
Also, what profiler did you use? what jvm?

Are you aware that hotspot (SUN jdk) will not inline methods (as much
as it normally does) when you profile from start to finish. An
interesting aproach is to start without profiliation and turn it on
after a while to give hotspot a chance to inline.
I do not have the source code for XMLChar so I can not verify that
they are as small as I suspect, something like:
"return c > X and c < Y;" which would most probably be inlined.
This is one of the things you have to watch out for when doing
profilation.

Ok, now there are a few things that could be the cause of you problem
but unless you are parsing on a remote object it should not be a
problem to parse.

Anyway other persons will probably have a few other ideas....

/robo
 
C

Christian Neuroth

Thanks for your answers... Some additional information:
Hmmm, how big file are you parsing?

~ 500 kb, but bigger files are going to come soon :)
Something like this:

<?xml version="1.0" encoding="UTF-8"?>
<RESPONSE XMLID="SYSTEM_0001">
<SUCCESS REQUESTID="0" COMMAND="LOGIN"/>
<SUCCESS REQUESTID="1" COMMAND="DOCUMENT">
<DOCUMENT ID="$(#ARCHIVE)\FOLDER,00000007,001" DELETED="0" SIZE="0"
FIELDCOUNT="2" CREATION="1057141179" EDITED="1057141179">
<FIELD ID="0" NAME="KNR" TYPE="STRING" USE="USER" CODE="ANSI"
ATTRIB="FieldID=1001">
<DATA><![CDATA[16051980]]></DATA>
</FIELD>
<FIELD ID="1" NAME="1014" TYPE="BLOB" USE="USER" CODE="BASE64"
REFERENZ="0" ATTRIB="FieldID=1014">
<BLOBNAME>srv249.</BLOBNAME>
<FILENAME>D:\TEMP\srv249.</FILENAME>
<FILESIZE>471848</FILESIZE>
<DATETIME>1057250952</DATETIME>
<BLOBTYPE>unknown</BLOBTYPE>
<DATA>
<![CDATA[JVBERi0x... here is one Base64 coded file - lot of data
....]]></DATA>
</FIELD>
</DOCUMENT>
</SUCCESS>
Also, what profiler did you use? what jvm?

Sun Jdk 1.4.1_03
Profiler: Eclipse Profiler Plugin, Version 0.5.27,
http://eclipsecolorer.sourceforge.net
Are you aware that hotspot (SUN jdk) will not inline methods (as much
as it normally does) when you profile from start to finish. An
interesting aproach is to start without profiliation and turn it on
after a while to give hotspot a chance to inline.

But performance problems also exist when I am not profiling, so
decreased inlined should not be the problem?!

I do not have the source code for XMLChar so I can not verify that
they are as small as I suspect, something like:
"return c > X and c < Y;" which would most probably be inlined.
This is one of the things you have to watch out for when doing
profilation.

Ok, now there are a few things that could be the cause of you problem
but unless you are parsing on a remote object it should not be a
problem to parse.

I am receiving the InputStream via HttpClient.... Hm. Could that be
the cause for the problem???

thanks

Christian
 
C

Christian Neuroth

Is the inputstream buffered or do you do socket access for each byte/char
you read?

ups...

i think that's it.... the application performs a read via the http
socket every time... how can i change that? simple by wrapping the
inputstream?

my code looks like:

EasyResponseReader reader = new EasyResponseReader();
ByteArrayOutputStream os = new ByteArrayOutputStream();
BufferedInputStream is = new BufferedInputStream
(postMethod.getResponseBodyAsStream());

byte[] buffer = new byte[8192];
int length;
while ((length = is.read(buffer)) >= 0) {
os.write(buffer, 0, length);
}


PostMethod is a class from Apache's HttpClient... If I change from
getResponseBodyAsStream to getResponseBody, I receive a byte array,
but now I have to wait as long as before until I get the return from
this method... :(

Nevertheless, many thanks for your help, Robert!

Christian
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Members online

No members online now.

Forum statistics

Threads
473,766
Messages
2,569,569
Members
45,043
Latest member
CannalabsCBDReview

Latest Threads

Top