speeding up URLConnection reading

Discussion in 'Java' started by mark, Nov 4, 2006.

  1. mark

    mark Guest

    Hello,

    I want to read the content of some webpages and make some string
    comparisons with them (i.e. check if there is some text in it, use some
    regular expressions, etc.).

    StringBuilder htmlCode = new StringBuilder();
    URL url = new URL(fileName);
    URLConnection conn = url.openConnection();
    conn.connect();
    BufferedReader dis = new BufferedReader(new
    InputStreamReader(conn.getInputStream()));
    String inputLine = "";
    for(;;) {
    inputLine = dis.readLine();
    if (inputLine == null) break;
    htmlCode.append(inputLine);
    }

    It works, but it is very, very slow comparing to browser. Do you know
    any ways to speed it up??

    Regards, mark
     
    mark, Nov 4, 2006
    #1
    1. Advertising

  2. mark

    Daniel Pitts Guest

    mark wrote:
    > Hello,
    >
    > I want to read the content of some webpages and make some string
    > comparisons with them (i.e. check if there is some text in it, use some
    > regular expressions, etc.).
    >
    > StringBuilder htmlCode = new StringBuilder();
    > URL url = new URL(fileName);
    > URLConnection conn = url.openConnection();
    > conn.connect();
    > BufferedReader dis = new BufferedReader(new
    > InputStreamReader(conn.getInputStream()));
    > String inputLine = "";
    > for(;;) {
    > inputLine = dis.readLine();
    > if (inputLine == null) break;
    > htmlCode.append(inputLine);
    > }
    >
    > It works, but it is very, very slow comparing to browser. Do you know
    > any ways to speed it up??
    >
    > Regards, mark


    Don't use a buffered reader, as you don't need to read it one line at a
    time.

    final URL url = new URL(adjustUrl(page));
    final HttpURLConnection connection = (HttpURLConnection)
    url.openConnection();

    connection.setRequestMethod(method);
    connection.connect();
    try {
    final InputStream is = connection.getInputStream();
    final Reader reader = new InputStreamReader(is);
    final char[] buf = new char[1024];
    int read;
    final StringBuffer sb = new StringBuffer();
    while((read = reader.read(buf)) > 0) {
    sb.append(buf, 0, read);
    }
    } finally {
    connection.disconnect();
    }
     
    Daniel Pitts, Nov 4, 2006
    #2
    1. Advertising

  3. mark

    mark Guest

    Hello,

    > Don't use a buffered reader, as you don't need to read it one line at a
    > time.


    Thank you. It's speed up the speed, although comparing to webbrowser it
    is still not enough. Do you know any other trick which could help me
    here? Thanks!

    Regards, mark
     
    mark, Nov 10, 2006
    #3
  4. mark

    EJP Guest

    mark wrote:
    > Thank you. It's speed up the speed, although comparing to webbrowser it
    > is still not enough. Do you know any other trick which could help me
    > here? Thanks!


    Raise that buffer from 1024 to 16384.
     
    EJP, Nov 10, 2006
    #4
  5. mark

    mark Guest

    Hello,

    > Raise that buffer from 1024 to 16384.


    Thank you. I did it but still no big improvement. I actually tried to
    play with jacarta httpClient and it increases the performance. The
    problem is that it is still unsatisfactory (i.e. it got the websites
    (cause I am going through a lot of pages at once) in 10 minutes, while
    my friend's script in visual basic did it in 3 minutes. So the
    difference is big, too big :(.

    GetMethod httpget = new GetMethod(fileName);
    httpget.setDoAuthentication(false);
    httpget.getParams().setParameter("http.connection.stalecheck", false);
    httpget.getParams().setParameter("http.protocol.expect-continue",
    false);
    try {
    httpclient.executeMethod(httpget);
    Reader reader = new InputStreamReader(
    httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
    char[] buf = new char[131072];
    int read;
    while((read = reader.read(buf)) > 0) {
    htmlCode.append(buf, 0, read);
    }} catch (Exception e) {
    e.printStackTrace();
    } finally {
    httpget.releaseConnection();
    } return htmlCode.toString();

    Any ideas how could I greatly improve its quality (is it possible in
    java)??

    Regards, mark
     
    mark, Nov 10, 2006
    #5
  6. mark

    Daniel Pitts Guest

    mark wrote:
    > Hello,
    >
    > > Raise that buffer from 1024 to 16384.

    >
    > Thank you. I did it but still no big improvement. I actually tried to
    > play with jacarta httpClient and it increases the performance. The
    > problem is that it is still unsatisfactory (i.e. it got the websites
    > (cause I am going through a lot of pages at once) in 10 minutes, while
    > my friend's script in visual basic did it in 3 minutes. So the
    > difference is big, too big :(.
    >
    > GetMethod httpget = new GetMethod(fileName);
    > httpget.setDoAuthentication(false);
    > httpget.getParams().setParameter("http.connection.stalecheck", false);
    > httpget.getParams().setParameter("http.protocol.expect-continue",
    > false);
    > try {
    > httpclient.executeMethod(httpget);
    > Reader reader = new InputStreamReader(
    > httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
    > char[] buf = new char[131072];
    > int read;
    > while((read = reader.read(buf)) > 0) {
    > htmlCode.append(buf, 0, read);
    > }} catch (Exception e) {
    > e.printStackTrace();
    > } finally {
    > httpget.releaseConnection();
    > } return htmlCode.toString();
    >
    > Any ideas how could I greatly improve its quality (is it possible in
    > java)??
    >
    > Regards, mark


    Multithread it, if you're downloading more than one thing, do them in
    paralelle.
     
    Daniel Pitts, Nov 10, 2006
    #6
  7. mark

    Guest

    mark wrote:
    > Hello,
    >
    > > Raise that buffer from 1024 to 16384.

    >
    > Thank you. I did it but still no big improvement. I actually tried to
    > play with jacarta httpClient and it increases the performance. The
    > problem is that it is still unsatisfactory (i.e. it got the websites
    > (cause I am going through a lot of pages at once) in 10 minutes, while
    > my friend's script in visual basic did it in 3 minutes. So the
    > difference is big, too big :(.
    >
    > GetMethod httpget = new GetMethod(fileName);
    > httpget.setDoAuthentication(false);
    > httpget.getParams().setParameter("http.connection.stalecheck", false);
    > httpget.getParams().setParameter("http.protocol.expect-continue",
    > false);
    > try {
    > httpclient.executeMethod(httpget);
    > Reader reader = new InputStreamReader(
    > httpget.getResponseBodyAsStream(), httpget.getResponseCharSet());
    > char[] buf = new char[131072];
    > int read;
    > while((read = reader.read(buf)) > 0) {
    > htmlCode.append(buf, 0, read);
    > }} catch (Exception e) {
    > e.printStackTrace();
    > } finally {
    > httpget.releaseConnection();
    > } return htmlCode.toString();
    >
    > Any ideas how could I greatly improve its quality (is it possible in
    > java)??
    >
    > Regards, mark


    You might want to put some statements to see how long it takes to
    establish the connection and how long it takes to read the content.

    Su Dang
     
    , Nov 10, 2006
    #7
  8. mark

    EJP Guest

    mark wrote:
    > Any ideas how could I greatly improve its quality (is it possible in
    > java)??


    You could get rid of the Reader and use an InputStream. But I think
    you're up against some network connectivity thing really.
     
    EJP, Nov 11, 2006
    #8
  9. mark

    mark Guest

    Hello,

    > You could get rid of the Reader and use an InputStream. But I think
    > you're up against some network connectivity thing really.


    I have just made some measurements and the most time consuming is
    getting the message into the string. I am actually using:

    StringBuilder str = new StringBuilder();
    char[] b = new char[32678];
    Reader reader = new InputStreamReader(
    method.getResponseBodyAsStream(), method.getResponseCharSet());
    for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
    String answer = str.toString();

    Is it possible to make it faster (all the chars are just a standard
    ascii text so there is no need to take care about utf, etc.).
     
    mark, Nov 11, 2006
    #9
  10. mark

    EJP Guest

    mark wrote:

    > Is it possible to make it faster (all the chars are just a standard

    ascii text so there is no need to take care about utf, etc.).

    LIke I said, you could use an InputStream instead of the Reader.
     
    EJP, Nov 12, 2006
    #10
  11. mark

    Chris Uppal Guest

    mark wrote:

    > I have just made some measurements and the most time consuming is
    > getting the message into the string. I am actually using:
    >
    > StringBuilder str = new StringBuilder();
    > char[] b = new char[32678];
    > Reader reader = new InputStreamReader(
    > method.getResponseBodyAsStream(), method.getResponseCharSet());
    > for (int n; (n = reader.read(b)) != -1;) str.append(b, 0, n);
    > String answer = str.toString();


    I find it /very/ hard to believe that decoding ASCII-valued binary data into
    ASCII-valued string data is slower than transmitting that data across a
    network. I think you must have mis-measured somehow.

    -- chris
     
    Chris Uppal, Nov 13, 2006
    #11
    1. Advertising

Want to reply to this thread or ask your own question?

It takes just 2 minutes to sign up (and it's free!). Just click the sign up button to choose a username and then you can ask your own questions on the forum.
Similar Threads
  1. Replies:
    1
    Views:
    562
    Roedy Green
    Sep 13, 2005
  2. Replies:
    4
    Views:
    2,713
  3. Replies:
    2
    Views:
    3,686
  4. per
    Replies:
    11
    Views:
    1,373
    S Arrowsmith
    Mar 9, 2009
  5. Lew
    Replies:
    11
    Views:
    593
    RedGrittyBrick
    Jan 25, 2010
Loading...

Share This Page