Please ignore my previous post: Why my Java code is THAT slow than C++?

K

Kevin

(I deleted that post online, but in case someone already got it via
email)

I found out that it is not the problem of my code.

The problem is the data file: for that particular task and for that
particular data file, the data file is in a format that favor that c++
code (so that it basically can skip the hash step), while my code is a
more generalized version. If I take special care of that particular
data format, my java code can get good speed too.

Sorry for the bother. :)
 
J

Jeffrey Schwab

Kevin said:
(I deleted that post online, but in case someone already got it via
email)

I found out that it is not the problem of my code.

The problem is the data file: for that particular task and for that
particular data file, the data file is in a format that favor that c++
code (so that it basically can skip the hash step), while my code is a
more generalized version. If I take special care of that particular
data format, my java code can get good speed too.

What are the new performance numbers?
 
K

Kevin

I just did a test, and my java code now runs as fast as the c++ code
(93 seconds, time including all, I basically use a "stopwatch" for it
beacuse it is what I need -- 1 or 2 seconds of miscount is possible).

Just in case other people may be interested in it, below, I briefly
state how I do the fast file read (for plain ascii in my case):
1) read in the file using InputStream, each time read in 32K data into
a byte[] buffer.
2) write my own "readLine()" method, which scan in the byte[] buffer,
and return a new byte[] as a line.
3) write my own "split(char c)" method, which break one byte[] into
many byte[].
If we want to hash this byte[], then write a string class around it to
provide the hash and other functions, etc. Try not convet them to
java's String, which will be slow.

Thanks all on this group. :)
 
W

William Brogden

I just did a test, and my java code now runs as fast as the c++ code
(93 seconds, time including all, I basically use a "stopwatch" for it
beacuse it is what I need -- 1 or 2 seconds of miscount is possible).

Just in case other people may be interested in it, below, I briefly
state how I do the fast file read (for plain ascii in my case):
1) read in the file using InputStream, each time read in 32K data into
a byte[] buffer.
2) write my own "readLine()" method, which scan in the byte[] buffer,
and return a new byte[] as a line.

Why a new byte[] when all you need is a start index and count? (Of course
that depends on keeping the initial buffer around.)
3) write my own "split(char c)" method, which break one byte[] into
many byte[].

See above question - the object holding index and count could calculate
a hashcode when it is created.
If we want to hash this byte[], then write a string class around it to
provide the hash and other functions, etc. Try not convet them to
java's String, which will be slow.

Very true!
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

Forum statistics

Threads
473,755
Messages
2,569,536
Members
45,016
Latest member
TatianaCha

Latest Threads

Top