Optimising the downloading of a large csv file into a string

Discussion in 'Java' started by Pike, Dec 7, 2003.

  1. Pike

    Pike Guest

    Hi,

    I need to download large CSV files into String objects for processing.
    Unfortunately my download routine seems to be exceptionally slow. I
    believe it's because of the following line

    ret+="\n" + line;

    If I download the csv files into Excel via Internet Explorer the
    transfer takes a few seconds, but using the method below takes several
    minutes.

    Does anyone know how I can make the download method faster? I can't
    find any java methods which will download the whole file in one go.

    Thanks,




    import java.io.*;
    import java.net.*;

    public class download {

    public static String download(String filename) {
    String ret="";
    URL javacodingURL = null;
    try {
    javacodingURL = new URL(filename);
    }catch(MalformedURLException e){
    // Malformed URL
    System.out.println("Error in given URL");
    return ret;
    }

    try {
    URLConnection connection = javacodingURL.openConnection();
    BufferedReader br = new BufferedReader(new
    InputStreamReader(connection.getInputStream()));
    String line = "";
    while ((line = br.readLine()) != null)
    if(ret.equals("")){
    ret=line;
    }else{
    ret+="\n" + line;
    }
    br.close();
    }catch(UnknownHostException e){
    System.out.println("Unknown Host");
    return ret;
    }catch(IOException e){
    System.out.println("Error in opening URLConnection,
    Reading or Writing");
    return ret;
    }
    return ret;
    }// end download method
    }// end download class
    Pike, Dec 7, 2003
    #1
    1. Advertising

  2. (Pike) writes:

    > Does anyone know how I can make the download method faster? I can't
    > find any java methods which will download the whole file in one go.


    You should build up the result using a StringBuffer.


    StringBuffer buf = new StringBuffer();

    > String line = "";
    > while ((line = br.readLine()) != null)


    buf.append(line).append('\n');

    > br.close();
    > }catch(UnknownHostException e){
    > System.out.println("Unknown Host");
    > return ret;
    > }catch(IOException e){
    > System.out.println("Error in opening URLConnection,
    > Reading or Writing");
    > return ret;
    > }


    // Remove trainling \n
    buf.setLength(buf.length()-1);
    ret = buf.toString();

    > return ret;
    > }// end download method
    > }// end download class
    Tor Iver Wilhelmsen, Dec 7, 2003
    #2
    1. Advertising

  3. Pike

    Chris Uppal Guest

    Pike wrote:
    > try {
    > URLConnection connection = javacodingURL.openConnection();
    > BufferedReader br = new BufferedReader(new
    > InputStreamReader(connection.getInputStream()));


    Besides using a StringBuffer as has already been suggested, you should put the
    buffering "as close" to the raw input stream as possible. I.e something like:

    try {
    URLConnection connection = javacodingURL.openConnection();
    Reader reader = new InputStreamReader(
    new BufferedReader(

    connection.getInputStream()));
    ....

    Otherwise the InputStreamReader will be reading tiny little chunks from the
    (presumably) unbuffered InputStream created by the URLConnection.

    -- chris
    Chris Uppal, Dec 7, 2003
    #3
  4. "Pike" <> wrote in message
    news:...
    > Hi,
    >
    > I need to download large CSV files into String objects for processing.
    > Unfortunately my download routine seems to be exceptionally slow. I
    > believe it's because of the following line
    >
    > ret+="\n" + line;
    >
    > If I download the csv files into Excel via Internet Explorer the
    > transfer takes a few seconds, but using the method below takes several
    > minutes.


    Your code iterates through the file, reading a line at a time.

    I tried that in in some code so I could update a progress bar.

    I gave it up when I realised that Java can read the
    entire file as a single read, about 100 times faster
    than it could read the file line by line.

    --
    Andrew Thompson
    * http://www.PhySci.org/ PhySci software suite
    * http://www.1point1C.org/ 1.1C - Superluminal!
    * http://www.AThompson.info/andrew/ personal site
    Andrew Thompson, Dec 7, 2003
    #4
  5. "Andrew Thompson" <> wrote in message
    news:u3PAb.43658$...
    > "Pike" <> wrote in message
    > news:...

    .....
    > > I need to download large CSV files into String objects for processing.
    > > Unfortunately my download routine seems to be exceptionally slow.

    .....
    > > ...takes several minutes.

    >
    > Your code iterates through the file, reading a line at a time.

    ....
    > ... Java can read the
    > entire file as a single read, about 100 times faster
    > than it could read the file line by line.


    Just to test that theory with an URL connection,
    I tried the following method on a 735Kb file.

    ****************************************
    public static String getContent(URL url)
    {
    String s = "";
    StringBuffer sb = new StringBuffer("");

    long t1, t2, t3, t4;

    t1 = (new Date()).getTime();
    try
    {
    URLConnection urlCon = url.openConnection();
    BufferedReader br = new BufferedReader(new
    InputStreamReader(urlCon.getInputStream()));
    String line = "";
    while ((line = br.readLine()) != null)
    if(sb.equals("")) { sb.append(line); }
    else { sb.append("\n" + line); }
    br.close();
    }
    catch(UnknownHostException e) { System.out.println(e); }
    catch(IOException e) { System.out.println(e); }
    t2 = (new Date()).getTime();

    t3 = (new Date()).getTime();
    try
    {
    URLConnection urlCon = url.openConnection();
    InputStream is = urlCon.getInputStream();
    byte b1[] = new byte[is.available()];
    int sz = is.read(b1);
    if (sz>=0) s = new String(b1);
    }
    catch(UnknownHostException e) { System.out.println(e); }
    catch(IOException e) { System.out.println(e); }
    t4 = (new Date()).getTime();

    String message = "Times:\nLine: \t" + (t2-t1) + "\nFile: \t" + (t4-t3);
    System.out.println( message );

    return message;
    }
    ****************************************

    The results are..
    Times:
    Line: 140
    File: 50

    Well, not 100 times faster (scratches head, maybe I
    was using a String as well) but almost 3 times faster..

    --
    Andrew Thompson
    * http://www.PhySci.org/ PhySci software suite
    * http://www.1point1C.org/ 1.1C - Superluminal!
    * http://www.AThompson.info/andrew/ personal site
    Andrew Thompson, Dec 8, 2003
    #5
  6. "Andrew Thompson" <> wrote in message
    news:i0RAb.43820$...
    ....
    > Just to test that theory with an URL connection,
    > I tried the following method on a 735Kb file.

    .......
    > The results are..
    > Times:
    > Line: 140
    > File: 50


    Those numbers were impressive, no?

    Would be more impressive if my method
    had been _reading_ the _entire_ file.
    Which it was not!

    I only noticed later when I started returning the
    file contents rather than just the time differences..

    --
    Andrew Thompson
    * http://www.PhySci.org/ PhySci software suite
    * http://www.1point1C.org/ 1.1C - Superluminal!
    * http://www.AThompson.info/andrew/ personal site
    Andrew Thompson, Dec 8, 2003
    #6
  7. "Andrew Thompson" <> wrote in message
    news:i0RAb.43820$...
    > "Andrew Thompson" <> wrote in message
    > news:u3PAb.43658$...
    > > "Pike" <> wrote in message
    > > news:...

    > ....


    <SNIP>

    >
    > Just to test that theory with an URL connection,
    > I tried the following method on a 735Kb file.
    >


    Well done, Andrew - an object lesson in practical programming !

    It's often forgotten by those new to programming, or a particular
    programming language, that programming is a *practical* art, and that
    experimentation plays a key role in formulating soutions.

    Put simply, if the programmer is not sure about something, or can find
    little, or no relevant information on it, editor and compiler should be
    wielded to whip up a little test code and try ideas out. The worst that can
    happen is that the ideas don't pan out; on the upside, something new is
    learned, and the problem solved.

    Cheers,

    Anthony Borla
    Anthony Borla, Dec 8, 2003
    #7
  8. "Anthony Borla" <> wrote in message
    news:L1WAb.44204$...

    > <SNIP>

    ...
    > > Just to test that theory with an URL connection,
    > > I tried the following method on a 735Kb file.

    ...
    > Well done, Andrew - an object lesson in practical programming !
    >
    > It's often forgotten by those new to programming, or a particular
    > programming language, that programming is a *practical* art, and that
    > experimentation plays a key role in formulating soutions.


    It's lucky I 'tested' it further later. :)
    Andrew Thompson, Dec 8, 2003
    #8
  9. Pike

    Pike Guest

    Thanks Andrew, and to everyone else who's contributed to this thread.
    It's so kind of you to assist me with my problem.

    I did some doodling with the code last night, and got something pretty
    close to your solution (but it's slower so I won't waste precious web
    space by posting it). However, it didn't have the is.available() bit
    and was thus only marginally faster than the Line by line reading
    method!!!

    Thanks again!


    Pike.
    Pike, Dec 8, 2003
    #9
  10. Try using this CSV Parser:
    http://ostermiller.org/utils/ExcelCSV.html

    String[][] values = com.Ostermiller.util.ExcelCSVParser.parse(
    new InputStreamReader(
    javacodingURL.openConnection().getInputStream()
    )
    );

    It should deal with your problems for you. It does buffering, it does
    efficient string creation. Plus it is only one line of code.

    Stephen
    Stephen Ostermiller, Dec 19, 2003
    #10
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. =?Utf-8?B?R3V5IFBlbmZvbGQ=?=

    Problem downloading csv file

    =?Utf-8?B?R3V5IFBlbmZvbGQ=?=, May 19, 2004, in forum: ASP .Net
    Replies:
    3
    Views:
    638
    Patrice
    May 19, 2004
  2. G Stark
    Replies:
    1
    Views:
    318
    Karl Seguin
    Mar 10, 2005
  3. Replies:
    1
    Views:
    460
  4. Tintin92
    Replies:
    1
    Views:
    1,672
    Andrew Thompson
    Feb 14, 2007
  5. Roedy Green

    O.T. optimising file placement

    Roedy Green, Feb 21, 2012, in forum: Java
    Replies:
    20
    Views:
    522
    Martin Gregorie
    Feb 25, 2012
Loading...

Share This Page