Optimising the downloading of a large csv file into a string

P

Pike

Hi,

I need to download large CSV files into String objects for processing.
Unfortunately my download routine seems to be exceptionally slow. I
believe it's because of the following line

ret+="\n" + line;

If I download the csv files into Excel via Internet Explorer the
transfer takes a few seconds, but using the method below takes several
minutes.

Does anyone know how I can make the download method faster? I can't
find any java methods which will download the whole file in one go.

Thanks,




import java.io.*;
import java.net.*;

public class download {

public static String download(String filename) {
String ret="";
URL javacodingURL = null;
try {
javacodingURL = new URL(filename);
}catch(MalformedURLException e){
// Malformed URL
System.out.println("Error in given URL");
return ret;
}

try {
URLConnection connection = javacodingURL.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(connection.getInputStream()));
String line = "";
while ((line = br.readLine()) != null)
if(ret.equals("")){
ret=line;
}else{
ret+="\n" + line;
}
br.close();
}catch(UnknownHostException e){
System.out.println("Unknown Host");
return ret;
}catch(IOException e){
System.out.println("Error in opening URLConnection,
Reading or Writing");
return ret;
}
return ret;
}// end download method
}// end download class
 
T

Tor Iver Wilhelmsen

Does anyone know how I can make the download method faster? I can't
find any java methods which will download the whole file in one go.

You should build up the result using a StringBuffer.


StringBuffer buf = new StringBuffer();
String line = "";
while ((line = br.readLine()) != null)
buf.append(line).append('\n');

br.close();
}catch(UnknownHostException e){
System.out.println("Unknown Host");
return ret;
}catch(IOException e){
System.out.println("Error in opening URLConnection,
Reading or Writing");
return ret;
}

// Remove trainling \n
buf.setLength(buf.length()-1);
ret = buf.toString();
 
C

Chris Uppal

Pike said:
try {
URLConnection connection = javacodingURL.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(connection.getInputStream()));

Besides using a StringBuffer as has already been suggested, you should put the
buffering "as close" to the raw input stream as possible. I.e something like:

try {
URLConnection connection = javacodingURL.openConnection();
Reader reader = new InputStreamReader(
new BufferedReader(

connection.getInputStream()));
....

Otherwise the InputStreamReader will be reading tiny little chunks from the
(presumably) unbuffered InputStream created by the URLConnection.

-- chris
 
A

Andrew Thompson

Pike said:
Hi,

I need to download large CSV files into String objects for processing.
Unfortunately my download routine seems to be exceptionally slow. I
believe it's because of the following line

ret+="\n" + line;

If I download the csv files into Excel via Internet Explorer the
transfer takes a few seconds, but using the method below takes several
minutes.

Your code iterates through the file, reading a line at a time.

I tried that in in some code so I could update a progress bar.

I gave it up when I realised that Java can read the
entire file as a single read, about 100 times faster
than it could read the file line by line.
 
A

Andrew Thompson

Andrew Thompson said:
Your code iterates through the file, reading a line at a time. ....
... Java can read the
entire file as a single read, about 100 times faster
than it could read the file line by line.

Just to test that theory with an URL connection,
I tried the following method on a 735Kb file.

****************************************
public static String getContent(URL url)
{
String s = "";
StringBuffer sb = new StringBuffer("");

long t1, t2, t3, t4;

t1 = (new Date()).getTime();
try
{
URLConnection urlCon = url.openConnection();
BufferedReader br = new BufferedReader(new
InputStreamReader(urlCon.getInputStream()));
String line = "";
while ((line = br.readLine()) != null)
if(sb.equals("")) { sb.append(line); }
else { sb.append("\n" + line); }
br.close();
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t2 = (new Date()).getTime();

t3 = (new Date()).getTime();
try
{
URLConnection urlCon = url.openConnection();
InputStream is = urlCon.getInputStream();
byte b1[] = new byte[is.available()];
int sz = is.read(b1);
if (sz>=0) s = new String(b1);
}
catch(UnknownHostException e) { System.out.println(e); }
catch(IOException e) { System.out.println(e); }
t4 = (new Date()).getTime();

String message = "Times:\nLine: \t" + (t2-t1) + "\nFile: \t" + (t4-t3);
System.out.println( message );

return message;
}
****************************************

The results are..
Times:
Line: 140
File: 50

Well, not 100 times faster (scratches head, maybe I
was using a String as well) but almost 3 times faster..
 
A

Andrew Thompson

....
Just to test that theory with an URL connection,
I tried the following method on a 735Kb file. .......
The results are..
Times:
Line: 140
File: 50

Those numbers were impressive, no?

Would be more impressive if my method
had been _reading_ the _entire_ file.
Which it was not!

I only noticed later when I started returning the
file contents rather than just the time differences..
 
A

Anthony Borla

Andrew Thompson said:

Just to test that theory with an URL connection,
I tried the following method on a 735Kb file.

Well done, Andrew - an object lesson in practical programming !

It's often forgotten by those new to programming, or a particular
programming language, that programming is a *practical* art, and that
experimentation plays a key role in formulating soutions.

Put simply, if the programmer is not sure about something, or can find
little, or no relevant information on it, editor and compiler should be
wielded to whip up a little test code and try ideas out. The worst that can
happen is that the ideas don't pan out; on the upside, something new is
learned, and the problem solved.

Cheers,

Anthony Borla
 
A

Andrew Thompson

<SNIP> ... ...
Well done, Andrew - an object lesson in practical programming !

It's often forgotten by those new to programming, or a particular
programming language, that programming is a *practical* art, and that
experimentation plays a key role in formulating soutions.

It's lucky I 'tested' it further later. :)
 
P

Pike

Thanks Andrew, and to everyone else who's contributed to this thread.
It's so kind of you to assist me with my problem.

I did some doodling with the code last night, and got something pretty
close to your solution (but it's slower so I won't waste precious web
space by posting it). However, it didn't have the is.available() bit
and was thus only marginally faster than the Line by line reading
method!!!

Thanks again!


Pike.
 
S

Stephen Ostermiller

Try using this CSV Parser:
http://ostermiller.org/utils/ExcelCSV.html

String[][] values = com.Ostermiller.util.ExcelCSVParser.parse(
new InputStreamReader(
javacodingURL.openConnection().getInputStream()
)
);

It should deal with your problems for you. It does buffering, it does
efficient string creation. Plus it is only one line of code.

Stephen
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Members online

No members online now.

Forum statistics

Threads
473,764
Messages
2,569,564
Members
45,039
Latest member
CasimiraVa

Latest Threads

Top