Improving performance of code

R

ruds

Hi,
I'm reading a file and doing some operations on it..It is a huge file
going in GB's.....
The code is working correctly but is very slow....How do i optimise
it...
My code snipnet is:
class Risk
{
public void compare(String infile) throws IOException
{
cnt=0;
for(i=0;i<qid.size();i++)
{
no=0;
fr=new FileReader(infile);
br=new BufferedReader(fr);
while((str=br.readLine())!=null)
{
no++;
if((str.startsWith("$"))||(str.startsWith("-CONT-")))
continue;
else
{
s2=str.substring(0,10);
if(s2.equals(qid.elementAt(i)))
{
cnt++;
start=no;
end=no+29;
quadarray(infile,start,end);
}

if((cnt==sc) && (i<qid.size()))
{
System.out.println("qid="+qid.elementAt(i));
cnt=0;
writesubcase1();
}
}
}
fr.close();

}

for(i=0;i<tid.size();i++)
{
no=0;
fr=new FileReader(infile);
br=new BufferedReader(fr);
while((str=br.readLine())!=null)
{
no++;
if((str.startsWith("$"))||(str.startsWith("-CONT-")))
continue;
else
{
s2=str.substring(0,10);
if(s2.equals(tid.elementAt(i)))
{
cnt++;
start=no;
end=no+29;
triaarray(infile,start,end);
}
if((cnt==sc) && (i<tid.size()))
{
System.out.println("tid="+tid.elementAt(i));
cnt=0;
writesubcase2();
}
}
}
fr.close();
}
}

public void quadarray(String ifile,int start,int end) throws
IOException
{
try
{
fr1=new FileReader(ifile);
br1=new BufferedReader(fr1);
line=0;
k=0;
x=0;
while((str1=br1.readLine())!=null)
{
line++;
if((line>=start) && (line<end))
{
if(j==0)
quad[j][k]=str1;
if((k==3) ||(k==17)||(k==20))
{
val1=Double.parseDouble(str1.substring(18,36));
if(val1>qmax[x])
{
qmax[x]=val1;
x++;
}
}
if((k==5) ||(k==8)||(k==22)||(k==25))
{
val2=Double.parseDouble(str1.substring(54,72));
if(val2>qmax[x])
{
qmax[x]=val2;
x++;
}
}
if((k==11)||(k==14)||(k==28))
{
val3=Double.parseDouble(str1.substring(36,54));
if(val3>qmax[x])
{
qmax[x]=val3;
x++;
}
}
k++;
}
}
}
catch (Exception e)
{ }
}

public void writesubcase1() throws IOException
{
x=0;
try
{
fw=new FileWriter("Result.txt",true);
for(y=0;y<30;y++)
{
if((y==0)||(y==1)||(y==2)||(y==4)||(y==6)||(y==7)||(y==9)||
(y==10)||(y==12)||(y==13)||(y==15) || (y==16)||(y==18)||(y==19)||
(y==21)||(y==23)||(y==24) || (y==26)||(y==27))
{
fw.write(quad[0][y]+"\n");
continue;
}
else
{
if((y==3)||(y==17)||(y==20))
{
s=quad[0][y];
fw.write(s.substring(0,28)+qmax[x]+s.substring(37)+"\n");
x++;
continue;
}
if((y==5)||(y==8)||(y==22)||(y==25))
{
s=quad[0][y];
fw.write(s.substring(0,64)+qmax[x]+"\n");
x++;
continue;
}
if((y==11)||(y==14))
{
s=quad[0][y];
fw.write(s.substring(0,46)+qmax[x]+s.substring(55)+"\n");
x++;
continue;
}
if(y==28)
{
s=quad[0][y];
fw.write(s.substring(0,46)+qmax[x]+"\n");
x++;
break;
}
}
}
fw.close();
}
catch(Exception e)
{}
}

public void triaarray(String ifile,int start,int end) throws
IOException
{
try
{
fr1=new FileReader(ifile);
br1=new BufferedReader(fr1);
line=0;
while((str1=br1.readLine())!=null)
{
line++;
if((line>=start) && (line<end))
{
if(j==0)
tria[j][k]=str1;
if(k==2)
{
val1=Double.parseDouble(str1.substring(37,54));
if(val1>tmax[0])
tmax[0]=val1;
}
if(k==5)
{
val2=Double.parseDouble(str1.substring(19,36));
if(val2>tmax[1])
tmax[1]=val2;
}
k++;
}
}
}
catch(Exception e)
{}
}

public void writesubcase2()
{
try
{
fw=new FileWriter("Result.txt",true);
for(y=0;y<7;y++)
{
if((y==0)||(y==1)||(y==3)||(y==4))
{
fw.write(tria[0][y]+"\n");
continue;
}
if(y==2)
{
s=tria[0][y];
fw.write(s.substring(0,47)+tmax[0]+s.substring(55)+"\n");
continue;
}
if(y==5)
{
s=tria[0][y];
fw.write(s.substring(0,29)+tmax[1]+"\n");
break;
}
}
fw.close();
}
catch(Exception e)
{}
}

public static void main(String args[])
{
Risk r=new Risk();
ipfile=args[0];

try
{
r.compare(ipfile);
}
catch (Exception e)
{ }
}
}

The code takes a lot of time in functions Quadarray and Triaaray.
As u can see the de is very simple in these functions but still it
takes lot of time...

How do i improve it??
 
E

Esmond Pitt

ruds said:
How do i improve it??

1. I don't see any need to read the files twice. Read them once each,
and look for both subcases on each line. This will double your speed. If
the output comes out in the wrong order, sort it later. BTW you should
be closing 'br' not 'fr' in this loop.

2. The loops on 'y' in the writesubcaseN() and xxxarray() methods seem
pretty pointless, as you do different things depending on the value of
'y'. Unroll these loops. You could use a lookup table to give you the
various offsets you need, and just loop over the lookup table. Or else
use a switch statement instead of all the tests on 'y'.

3. The triarray() and quadarray() methods probably spend most of their
time catching up to where you already are in the file. Do you really
need to do this?
 
R

ruds

1. I don't see any need to read the files twice. Read them once each,
and look for both subcases on each line. This will double your speed. If
the output comes out in the wrong order, sort it later. BTW you should
be closing 'br' not 'fr' in this loop.

2. The loops on 'y' in the writesubcaseN() and xxxarray() methods seem
pretty pointless, as you do different things depending on the value of
'y'. Unroll these loops. You could use a lookup table to give you the
various offsets you need, and just loop over the lookup table. Or else
use a switch statement instead of all the tests on 'y'.

3. The triarray() and quadarray() methods probably spend most of their
time catching up to where you already are in the file. Do you really
need to do this?

For the 1 & 2 sugestion points i did get those..but for the 3 point I
dont have any other way out..atleast from my point of view
If u can suggest me smthing better than this ur welcome...
I'm a newbie at handling files...
Thanx a lot.
 
C

Chris Uppal

Mike said:
Indent it and comment it, for a start. In its current state, it's
unreadable.

The apparent lack of indentation is a bug in the newsreader you (and I) are
using, not a deficiency in the posted source.

-- chris
 
C

Chris Uppal

ruds said:
I'm reading a file and doing some operations on it..It is a huge file
going in GB's.....
The code is working correctly but is very slow....How do i optimise
it...

I found your code difficult to follow, you could improve it by using case
statements instead of lots of if-s, by returning from functions as soon as you
know the there is nothing else to do (rather than having the "real" code buried
inside several nested if-s), and above all (as Mike has already mentioned) by
commenting it properly.

So, it's quite possible that I've misread or misunderstood what the code is
doing, but if I /haven't/ got it wrong, then I'm puzzled by what quadarray() is
doing (and the other similar methods). I /looks/ as if it loops over the
entire (huge) input file, keeping count of which line it's looking at (in
variable 'k' -- /not/ a good name, unless there's something special in the
domain which makes 'k' self-explanatory), and only doing anything with certain
numbered lines, 20, 14, 28, and so on. But if that's true, then it doesn't do
anything at all with lines > 28, so there is no point in looping over the
remaining lines in the input file.

If I'm wrong about that (i.e. if you do have to read data from every, or nearly
every, line of the big files), and if Daniel's suggestion about reducing the
number of passes isn't suitable, then I don't think there's very much you can
do to speed it up. If I /had/ to maximise the speed of something like this,
then I'd first try to work out what was the fastest I could possibly scan data
from the files, by writing a small test program which read in all the data as
/binary/ (so there are no conversion costs), and which didn't do anything with
the data. That would give me a baseline so I could tell whether there was any
reasonable speedup available even in theory (there might not be). If that did
turn out to be significantly, /and usefully/, faster than my current code, then
I'd consider (i.e do a few experiments with), doing most of the processing as
binary. It seems to me that you don't use most of the data on most lines, so
if you can scan the data as binary, and only incur the expense of converting
the data you actually need into text, then you might be able to save some time.
But there again, it might make almost no difference. Only measurement will
tell you (or an analytic, numeric, understanding of the performance could do
tell you too, but that would require data that I don't have here, and I suspect
you don't have either).

BTW, this sounds like one of the examples where profiling is unlikely to be
very helpful (like many examples of using profiling, in my experience).
Profiling is an excellent tool if you have an unexpected hot-spot in your code
which you don't realise is there -- it will point out your error with
devastating clarity. But that situation's not too likely to happen to
competent programmers[*]. The other case where profiling is useful is where
you have a reasonable idea of how long things /should/ take, and you can use
profiling to attach actual numbers to your mental model of the performance.

Oh, another thing that's often worth a try (if you are on Windows or some other
OS which supports transparent compression in the filesystem), is to tell the OS
to compress the data. If your program is primarily IO bound, rather than CPU
bound (which sounds likely in your case -- and it's easy for you to check),
then compressing the data will reduce the amount of data which has to be read
off-disk, albeit at the expense of more processing, which can sometimes be a
useful saving.

-- chris

[*] but it never hurts to check, even so -- if you have time...
 
L

Lew

Chris said:
The apparent lack of indentation is a bug in the newsreader you (and I) are
using, not a deficiency in the posted source.

I'm using Thunderbird. I see the original post's indentation, and that it was
done with the TAB character.

No doubt the space character would not have caused such difficulties. Even
though I can see the indentation, the TAB character makes it so wide as to
damage readability.

So either way, OP, using TABs to indent Usenets posts is a Bad Thing.
 
P

Patricia Shanahan

Lew said:
I'm using Thunderbird. I see the original post's indentation, and that
it was done with the TAB character.

No doubt the space character would not have caused such difficulties.
Even though I can see the indentation, the TAB character makes it so
wide as to damage readability.

So either way, OP, using TABs to indent Usenets posts is a Bad Thing.

I am not that worried about the indentation, because if I get serious
about looking at posted program I copy it into Eclipse and click
Source-Format.

I do think the first step in a performance campaign should be making
sure the code is properly commented, as well as having meaningful
identifiers, no arbitrary, unexplained constants etc. The big
improvements usually depend on understanding the code, so that data
structures and algorithms can be changed.

Patricia
 
G

Greg R. Broderick

How do i improve it??

1. USE MEANINGFUL VARIABLE NAMES (i.e. more that just a single letter)!


2. Pay attention to horizontal white space -- makes code a LOT easier to
read if there are spaces. Use:

if ((str.startsWith("$")) || (str.startsWith("-CONT-")))

or

if ((str.startsWith("$")) ||
(str.startsWith("-CONT-")))


instead of

if((str.startsWith("$"))||(str.startsWith("-CONT-")))


3. Declare ALL of your variables before you use them. In quadarray() it
appears to me that the variables "j", "quad", "str1", "val1", "qmax", "val2"
are used without having been previously declared.


Just a few suggestions that will prevent your name being cursed by those who
come after you and maintain your code.

Cheers!

--
---------------------------------------------------------------------
Greg R. Broderick (e-mail address removed)

A. Top posters.
Q. What is the most annoying thing on Usenet?
---------------------------------------------------------------------
 
L

Lars Enderin

Chris Uppal skrev:
The apparent lack of indentation is a bug in the newsreader you (and I) are
using, not a deficiency in the posted source.
Thunderbird shows all of the tabs, which should have been replaced by
two or maybe three spaces each. It certainly was indented.
 
M

Mike Schilling

Chris Uppal said:
The apparent lack of indentation is a bug in the newsreader you (and I)
are
using, not a deficiency in the posted source.

So it is. (Though, oddly, Right-Click->Properties->Details->Message Source
displays it correctly, and that can be cut-and-pasted into an editor..) So
as far as that goes, <Emily Litella>Never mind.</Emily Litella>

Still, even indented, it would take considerable brainpower to determine
what the code is trying to do, let alone how to make it do the same thing
faster. Comments would make it far more likely that I'd make the effort.
 
S

squirrel

Hi,
I'm reading a file and doing some operations on it..It is a huge file
going in GB's.....
The code is working correctly but is very slow....How do i optimise
it...
My code snipnet is:
class Risk
{
public void compare(String infile) throws IOException
{
cnt=0;
for(i=0;i<qid.size();i++)
{
no=0;
fr=new FileReader(infile);
br=new BufferedReader(fr);
while((str=br.readLine())!=null)
{
no++;
if((str.startsWith("$"))||(str.startsWith("-CONT-")))
continue;
else
{
s2=str.substring(0,10);
if(s2.equals(qid.elementAt(i)))
{
cnt++;
start=no;
end=no+29;
quadarray(infile,start,end);
}

if((cnt==sc) && (i<qid.size()))
{
System.out.println("qid="+qid.elementAt(i));
cnt=0;
writesubcase1();
}
}
}
fr.close();

}

for(i=0;i<tid.size();i++)
{
no=0;
fr=new FileReader(infile);
br=new BufferedReader(fr);
while((str=br.readLine())!=null)
{
no++;
if((str.startsWith("$"))||(str.startsWith("-CONT-")))
continue;
else
{
s2=str.substring(0,10);
if(s2.equals(tid.elementAt(i)))
{
cnt++;
start=no;
end=no+29;
triaarray(infile,start,end);
}
if((cnt==sc) && (i<tid.size()))
{
System.out.println("tid="+tid.elementAt(i));
cnt=0;
writesubcase2();
}
}
}
fr.close();
}
}

public void quadarray(String ifile,int start,int end) throws
IOException
{
try
{
fr1=new FileReader(ifile);
br1=new BufferedReader(fr1);
line=0;
k=0;
x=0;
while((str1=br1.readLine())!=null)
{
line++;
if((line>=start) && (line<end))
{
if(j==0)
quad[j][k]=str1;
if((k==3) ||(k==17)||(k==20))
{
val1=Double.parseDouble(str1.substring(18,36));
if(val1>qmax[x])
{
qmax[x]=val1;
x++;
}
}
if((k==5) ||(k==8)||(k==22)||(k==25))
{
val2=Double.parseDouble(str1.substring(54,72));
if(val2>qmax[x])
{
qmax[x]=val2;
x++;
}
}
if((k==11)||(k==14)||(k==28))
{
val3=Double.parseDouble(str1.substring(36,54));
if(val3>qmax[x])
{
qmax[x]=val3;
x++;
}
}
k++;
}
}
}
catch (Exception e)
{ }
}

public void writesubcase1() throws IOException
{
x=0;
try
{
fw=new FileWriter("Result.txt",true);
for(y=0;y<30;y++)
{
if((y==0)||(y==1)||(y==2)||(y==4)||(y==6)||(y==7)||(y==9)||
(y==10)||(y==12)||(y==13)||(y==15) || (y==16)||(y==18)||(y==19)||
(y==21)||(y==23)||(y==24) || (y==26)||(y==27))
{
fw.write(quad[0][y]+"\n");
continue;
}
else
{
if((y==3)||(y==17)||(y==20))
{
s=quad[0][y];
fw.write(s.substring(0,28)+qmax[x]+s.substring(37)+"\n");
x++;
continue;
}
if((y==5)||(y==8)||(y==22)||(y==25))
{
s=quad[0][y];
fw.write(s.substring(0,64)+qmax[x]+"\n");
x++;
continue;
}
if((y==11)||(y==14))
{
s=quad[0][y];
fw.write(s.substring(0,46)+qmax[x]+s.substring(55)+"\n");
x++;
continue;
}
if(y==28)
{
s=quad[0][y];
fw.write(s.substring(0,46)+qmax[x]+"\n");
x++;
break;
}
}
}
fw.close();
}
catch(Exception e)
{}
}

public void triaarray(String ifile,int start,int end) throws
IOException
{
try
{
fr1=new FileReader(ifile);
br1=new BufferedReader(fr1);
line=0;
while((str1=br1.readLine())!=null)
{
line++;
if((line>=start) && (line<end))
{
if(j==0)
tria[j][k]=str1;
if(k==2)
{
val1=Double.parseDouble(str1.substring(37,54));
if(val1>tmax[0])
tmax[0]=val1;
}
if(k==5)
{
val2=Double.parseDouble(str1.substring(19,36));
if(val2>tmax[1])
tmax[1]=val2;
}
k++;
}
}
}
catch(Exception e)
{}
}

public void writesubcase2()
{
try
{
fw=new FileWriter("Result.txt",true);
for(y=0;y<7;y++)
{
if((y==0)||(y==1)||(y==3)||(y==4))
{
fw.write(tria[0][y]+"\n");
continue;
}
if(y==2)
{
s=tria[0][y];
fw.write(s.substring(0,47)+tmax[0]+s.substring(55)+"\n");
continue;
}
if(y==5)
{
s=tria[0][y];
fw.write(s.substring(0,29)+tmax[1]+"\n");
break;
}
}
fw.close();
}
catch(Exception e)
{}
}

public static void main(String args[])
{
Risk r=new Risk();
ipfile=args[0];

try
{
r.compare(ipfile);
}
catch (Exception e)
{ }
}

}

The code takes a lot of time in functions Quadarray and Triaaray.
As u can see the de is very simple in these functions but still it
takes lot of time...

How do i improve it??

I have a idea to improve it. Maybe possible.
By using NIO, a large file will be splited serval parts as
MappedByteBuffer instaces. And then using multiple threads to parse
each MappedByteBuffer instaces. There will be N+1 threads to work on
parsing file. The performance will be higher.
 
R

ruds

How do i improve it??
I have a idea to improve it. Maybe possible.
By using NIO, a large file will be splited serval parts as
MappedByteBuffer instaces. And then using multiple threads to parse
each MappedByteBuffer instaces. There will be N+1 threads to work on
parsing file. The performance will be higher.

Do i have to crwate new threads for this or they will be created by
JVM??
Sorry if it is a stupid question but i hve'nt done Multithreading yet..
 
S

squirrel

Do i have to crwate new threads for this or they will be created by
JVM??
Sorry if it is a stupid question but i hve'nt done Multithreading yet..

My idea is the following:
1.Using one thread, named main thread, creates FileChannel and splits
the file's content into n instances of MappedByteBuffer;
2.The main thread starts n threads, each thread is responsible to
parse one MappedByteBuffer, and the main thread can be hold on to wait
the result of each thread and collect them. Oh, Observer may be the
best choice for this case.
BTW, we should consider the case of one sentence will be splitted into
two MappedByteBuffer.

The idea is not just one idea, not be implemented by me. I wish it
would be feasible.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,743
Messages
2,569,478
Members
44,898
Latest member
BlairH7607

Latest Threads

Top